Graduate Texts 
ie Mathematics 


C.T J. Dodson 
T Poston 

Tensor 

Geometry 

The Geometric Viewpoint 
and its Uses 

Second Edition 



Springer 



Graduate Texts in Mathematics 


Editorial Board 
J. H. Ewing 
F. W. Gehring 
P. R. Halmos 


Springer-Verlag Berlin Heidelberg GmbH 


Pit>z.c. %o£Acjho£uui 



Graduate Texts in Mathematics 


1 Takeuti/Zaring. Introduction to Axiomatic Set Theory. 2nd ed. 

2 Oxtoby. Measure and Category. 2nd ed. 

3 Schaeffer. Topological Vector Spaces. 

4 Hilton/Stammbach. A Course in Homological Algebra. 

5 Maclane. Categories for the Working Mathematician. 

6 Hughes/Piper. Projective Planes. 

7 Serre. A Course in Arithmetic. 

8 Takeuti/Zaring. Axiomatic Set Theory. 

9 Humphreys. Introduction to Lie Algebras and Representation Theory. 

10 Cohen. A Course in Simple Homotopy Theory. 

11 Conway. Functions of One Complex Variable. 2nd ed. 

12 Beals. Advanced Mathematical Analysis. 

13 Anderson/Fuller. Rings and Categories of Modules. 

14 Golubitsky/Guillemin. Stable Mappings and Their Singularities. 

15 Berberian. Lectures in Functional Analysis and Operator Theory. 

16 Winter. The Structure of Fields. 

17 Rosenblatt. Random Processes. 2nd ed. 

18 Halmos. Measure Theory. 

19 Halmos. A Hilbert Space Problem Book. 2nd ed., revised. 

20 Husemoller. Fibre Bundles. 2nd ed. 

21 Humphreys. Linear Algebraic Groups 

22 Barnes/Mack. An Algebraic Introduction to Mathematical Logic. 

23 Greub. Linear Algebra. 4th ed. 

24 Holmes. Geometric Functional Analysis and its Applications. 

25 Hewitt/Stromberg. Real and Abstract Analysis. 

26 Manes. Algebraic Theories. 

27 Kelley. General Topology. 

28 Zariski/Samuel. Commutative Algebra. Vol. I. 

29 Zariski/Samuel. Commutative Algebra. Vol. II. 

30 Jacobson. Lectures in Abstract Algebra I: Basic Concepts. 

31 Jacobson. Lectures in Abstract Algebra II: Linear Algebra 

32 Jacobson. Lectures in Abstract Algebra III: Theory of Fields and Galois Theory. 

33 Hirsch. Differential Topology. 

34 Spitzer. Principles of Random Walk. 2nd ed. 

35 Wermer. Banach Algebras and Several Complex Variables. 2nd ed. 

36 Kelley/Namioka et al. Linear Topological Spaces. 

37 Monk. Mathematical Logic. 

38 Grauert/Fritzsche. Several Complex Variables. 

39 Arveson. An Invitation to C*-Algebras. 

40 Kemeny/Snell/Knapp. Denumerable Markov Chains. 2nd ed. 

41 Apostol. Modular Functions and Dirichlet Series in Number Theory. 

42 Serre. Linear Representations of Finite Groups. 

43 Gillman/Jerison. Rings of Continuous Functions. 

44 Kendig. Elementary Algebraic Geometry. 

45 Loeve. Probability Theory I. 4th ed. 

46 Loeve. Probability Theory II. 4th ed. 

47 MoiSE. Geometric Topology in Dimensions 2 and 3. 


Pate 7^ai4e##ia££&a 



C.T.J. Dodson T. Poston 


Tensor Geometry 

The Geometric Viewpoint and its Uses 


With 177 Figures 
Second Edition 



Springer 


Pit>z.c. %o£Acjho£uui 



Christopher Terence John Dodson 
Department of Mathematics 
University of Manchester 
Institute of Science and Technology 
Manchester M60 1QD 
United Kingdom 
e-mail: dodson@umist.ac.uk 


Timothy Poston 
14 White Church Road 
0513 Singapore 
Singapore 


Editorial Board 
J. H. Ewing 

Department of Mathematics 
Indiana University 
Bloomington, IN 47405, USA 

P. R. Halmos 

Department of Mathematics 
Santa Clara University 
Santa Clara, CA 95053, USA 


F. W. Gehring 

Department of Mathematics 
University of Michigan 
Ann Arbor, MI 48109, USA 


The first edition of this book was published by 
Pitman Publishing Ltd., London, in 1977 

Corrected Second Printing of the Second Edition 1997 


Mathematics Subject Classification (1991): 53-XX, 15-XX 


Library of Congress Cataloging-in-Publication Data 
Dodson, C. T. J. 

Tensor geometry : the geometric viewpoint and Its uses / C.T.J. 

Dodson, T. Poston. — 2nd ed. 

p. cm. — (Graduate texts In mathematics ; 130) 

"Second corrected printing"—T.p. verso. 

Includes bibliographical references (p. - ) and Index. 

ISBN 978-3-662-13117-6 ISBN 978-3-642-10514-2 (eBook) 

DOI 10.1007/978-3-642-10514-2 

1. Geometry, Differential. 2. Calculus of tensors. I. Poston, 

T. II. Title. III. Series. 

QA649.D6 1991 

516.3'6—dc21 97-13430 

CIP 

ISSN 0072-5285 

ISBN 978-3-662-13117-6 

This work is subject to copyright. All rights are reserved, whether the whole or part of the material 
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilms or in any other way, and storage in data banks. 
Duplication of this publication or parts thereof is permitted only under the provisions of the 
German Copyright Law of September 9,1965, in its current version, and permission for use must 
always be obtained from Springer-Verlag Berlin Heidelberg GmbH. 

Violations are liable for prosecution under the German Copyright Law. 

© Springer-Verlag Berlin Heidelberg 1991 

Originally published by Springer-Verlag Berlin Heidelberg New York in 1991 
Softcover reprint of the hardcover 2nd edition 1991 

SPIN 11002116 41/3111-54321- Printed on acid-free paper 


7^oi4e##ia£liia 



Preface to the Second Printing 
of the Second Edition 


This edition is essentially a reprinting of the Second Edition, with the addi¬ 
tion of two items to the Supplementary Bibliography, namely, Dodson and 
Parker: A User’s Guide to Algebraic Topology, and Gray: Modern Differential 
Geometry of Curves and Surfaces. 

This latter text is very important since it contains Mathematica programs 
to perform all of the essential differential geometric operations on curves and 
surfaces in 3-dimensional Euclidean space. The programs are available by 
anonymous ftp from bianchi.umd.edu/pub/ and are being used as support 
for a course at, among other places, UMIST: http://www.ma.umist.ac.uk/kd 
/ma351/ma351.html . 

June 1997 Kit Dodson 

Manchester, U.K. 

Tim Poston 
Singapore 
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Thanks are due to the Springer staff in Heidelberg for their enthusiastic sup¬ 
port and to the typist, Armin Kollner for the excellence of the final result. 
Once again, it has been achieved with the authors in yet two other countries. 

November 1990 Kit Dodson 

Toronto, Canada 
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Introduction 


The title of this book is misleading. 

Any possible title would mislead somebody. “Tensor Analysis” suggests 
to a mathematician an ungeometric, manipulative debauch of indices, with 
tensors ill-defined as “quantities that transform according to” unspeakable 
formulae. “Differential Geometry” would leave many a physicist unaware that 
the book is about matters with which he is very much concerned. We hope 
that “Tensor Geometry” will at least lure both groups to look more closely. 

Most modern “differential geometry” texts use a coordinate-free notation 
almost throughout. This is excellent for a coherent understanding, but leaves 
the physics student quite unequipped for the physical literature, or for the 
specific physical computations in which coordinates are unavoidable. Even 
when the relation to classical notation is explained, as in the magnificent 
[Spivak], pseudo-Riemannian geometry is barely touched on. This is crippling 
to the physicist, for whom spacetime is the most important example, and 
perverse even for the geometer. Indefinite metrics arise as easily within pure 
mathematics (for instance in Lie group theory) as in applications, and the 
mathematician should know the differences between such geometries and the 
positive definite type. In this book therefore we treat both cases equally, and 
describe both relativity theory and (in Ch. IX, §6) an important “abstract” 
pseudo Riemannian space, SL(2;R). 

The argument is largely carried in modern, intrinsic notation which lends 
itself to an intensely geometric (even pictorial) presentation, but a running 
translation into indexed notation explains and derives the manipulation rules 
so beloved of, and necessary to, the physical community. Our basic notations 
are summarised in Ch. 0, along with some basic physics. 

Einstein’s system of 1905 deduced everything from the Principle of Rela¬ 
tivity: that no experiment whatever can define for an observer his “absolute 
speed”. Minkowski published in 1907 a geometric synthesis of this work, re¬ 
placing the once separately absolute space and time of physics by an absolute 
four dimensional spacetime. Einstein initially resisted this shift away from 
argument by comparison of observers, but was driven to a more “spacetime 
geometric” view in his effort to account for gravitation, which culminated 
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in 1915 with General Relativity. For a brilliant account of the power of the 
Principle of Relativity used directly, see [Feynman]; particularly the deduc¬ 
tion (vol.2, p. 13-16) of magnetic effects from the laws of electrostatics. It 
is harder to maintain this approach when dealing with the General theory. 
The Equivalence Principle (the most physical assumption used) is hard even 
to state precisely without the geometric language of covariant differentiation, 
while Einstein’s Equation involves sophisticated geometric objects. Before any 
detailed physics, therefore, we develop the geometrical setting: Chapters I - X 
are a geometry text, whose material is chosen with an eye to physical useful¬ 
ness. The motivation is largely geometric also, for accessibility to mathematics 
students, but since physical thinking occasionally offers the most direct insight 
into the geometry, we cover in Ch. 0, §3 those elementary facts about special 
relativity that we refer to before Ch. XI. British students of either mathemat¬ 
ics or physics should usually know this much before reaching university, but 
variations in educational systems - and students - are immense. 

The book’s prerequisites are some mathematical or physical sophistication, 
the elementary functions (log, exp, cos, cosh, etc.), plus the elements of vector 
algebra and differential calculus, taught in any style at all. Chapter I will 
be a recapitulation and compendium of known facts, geometrically expressed, 
for the student who has learnt “Linear Algebra”. The student who knows 
the same material as “Matrix Theory” will need to read it more carefully, as 
the style of argument will be less familiar. (S)he will be well advised to do a 
proportion of the exercises, to consolidate understanding on matters like “how 
matrices multiply” which we assume familiar from some point of view. The 
next three chapters develop affine and linear geometry, with material new to 
most students and so more slowly taken. Chapter V sets up the algebra of 
tensors, handling both ends and the middle of the communication gap that 
made 874 U.S. “active research physicists” [Miller] rank “tensor analysis” 
ninth among all Math courses needed for physics Ph.D. students, more than 
80% considering it necessary, while “multilinear algebra” is not among the first 
25, less than 20% in each specialisation reommending it. “Multilinear algebra” 
is just the algebra of the manipulations, differentiation excepted, that make 
up “tensor analysis”. 

Chapter VI covers those facts about continuity, compactness and so on 
needed for precise argument later; we resisted the temptation to write a topol¬ 
ogy text. Chapter VII treats differential calculus “in several variables”, namely 
between affine spaces. The affine setting makes the “local linear approxima- 
tion” character of the derivative much more perspicuous than does a use of 
vector spaces only, which permit much more ambiguity as to “where vectors 
are”. This advantage is increased when we go on to construct manifolds; mod¬ 
elling them on affine spaces gives an unusually neat and geometric construction 
of the tangent bundle and its own manifold structure. These once set up, we 
treat the key facts about vector fields, previously met as “first order differ- 
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ential equations” by many readers. To keep the book selfcontained we show 
the existence and smoothness of flows for vector fields (solutions to equations) 
in an Appendix, by a recent, simple and attractively geometric proof due to 
Sotomayor. The mathematical sophistication called for is greater than for the 
body of the book, but so is that which makes a student want a proof of this 
result. 

Chapter VIII begins differential geometry proper with the theory of con¬ 
nections, and their several interrelated geometric interpretations. The “rolling 
tangent planes without slipping” picture allows us to “see” the connection 
between tangent spaces along a curve in an ordinary embedded surface, while 
the intrinsic geometry of the tangent bundle formulation gives a tool both 
mathematically simpler in the end, and more appropriate to physics. 

Chapter IX discusses geodesics both locally and variationally, and exam¬ 
ines some special features of indefinite metric geometry (such as geodesics 
never “the shortest distance between two points”). Geodesics provide the key 
to analysis of a wealth of illuminating examples. 

In Chapter X the Riemann curvature tensor is introduced as a measure 
of the failure of a manifold-with-connection to have locally the flat geometry 
of an affine space. We explore its geometry, and that of the related objects 
(scalar curvature, Ricci tensor, etc.) important in mathematics and physics. 

Chapter XI is concerned chiefly with a geometric treatment of how matter 
and its motion must be described, once the Newtonian separation of space and 
time dissolves into one absolute spacetime. It concludes with an explanation 
of the geometric incompatibility of gravitation with any simple flat view of 
spacetime, so leading on to general relativity. 

Chapter XII uses all of the geometry (and many of the examples) previ¬ 
ously set up, to make the interaction of matter and spacetime something like 
a visual experience. After introducing the equivalence principle and Einstein’s 
equation, and discussing their cosmic implications, we derive the Schwarzschild 
solution and consider planetary motion. By this point we are equipped both 
to compute physical quantities like orbital periods and the famous advance 
of the perihelion of Mercury, and to see that the paths of the planets (which 
to the flat or Riemannian intuition have little in common with straight lines) 
correspond indeed to geodesics. 

Space did not permit the coherent inclusion of differential forms and inte¬ 
gration. Their use in geometry involves connection and curvature forms with 
values not in the real numbers but in the Lie algebra of the appropriate Lie 
group. A second volume will treat these topics and develop the clear expo¬ 
sition of the tensor geometric tools of solid state physics, which has suffered 
worse than most subjects from index debauchery. 

The only feature in which this book is richer than in pictures (to strengthen 
geometric insight) is exercises (to strengthen detailed comprehension). Many 
of the longer and more intricate proofs have been broken down into carefully 
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programmed exercises. To work through a proof in this way teaches the mind, 
while a displayed page of calculation merely blunts the eye. 

Thus, the exercises are an integral part of the text. The reader need not 
do them all, perhaps not even many, but should read them at least as carefully 
as the main text, and think hard about any that seem difficult. If the “really 
hard” proportion seems to grow, reread the recent parts of the text - doing 
more exercises. 

We are grateful to various sources of support during the writing of this 
book: Poston to the Instituto de Matematica Pura e Aplicada in Rio de 
Janeiro, Montroll’s “Institute for Fundamental Studies” in Rochester, N.Y., 
the University of Oporto, and at Battelle Geneva to the Fonds National Su¬ 
isse de la Recherche Scientifique (Grant no. 2.461-0.75) and to Battelle Insti¬ 
tute, Ohio (Grant no. 333-207); Dodson to the University of Lancaster and 
(1976-77) the International Centre for Theoretical Physics for hospitality dur¬ 
ing a European Science Exchange Programme Fellowship sabbatical year. We 
learned from conversation with too many people to begin to list. Each author, 
as usual, is convinced that any remaining errors are the responsibility of the 
other, but errors in the diagrams are due to the draughtsman, Poston, alone. 

Finally, admiration, gratitude and sympathy are due Sylvia Brennan for 
the vast job well done of preparing camera ready copy in Lancaster with the 
authors in two other countries. 


Kit Dodson 
ICTP, Trieste 

Tim Poston 
Battelle, Geneva 
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0. Fundamental Not (at ) ions 


“Therefore is the name of it called Babel; 
because the Lord did there confound the language 
of all the earth”, 

Genesis 11, 9 

Please at least skim through this chapter; if a mathematician, your habits 
are probably different somewhere (maybe Z" 1 not /"“) and if a physicist, 
perhaps almost everywhere. 

1. Sets 

A sef, or class , or family is a collection of things, called members , elements , 
or points of it. Brackets like { } will always denote a set, with the elements 
either listed between them (as, {1,3,1,2}, the set whose elements are the 
number 1, 2 and 3 - repetition, and order, make no difference) or specified 
by a rule, in the form { x | x is an integer, x 2 = 1} or {Integer x | x 2 = 1}, 
which are abbreviations of “the set of all those things x such that x is an 
integer and x 2 = 1” which is exactly the set {1,-1}. Read the vertical line | 
as “such that” when it appears in a specification of a set by a rule. 

Sets can be collections of numbers (as above), of people ({Henry Crun, 
Peter Kropotkin, Balthazar Vorster}), of sets ({ {Major Bludnok, Oberon}, 
{1,-1},{this book}}), or of things with little in common beyond their 
declared membership of the set ({passive resistance, the set of all wigs, 3, 
Isaac Newton}) though this is uncommon in everyday mathematics. 

We abbreviate “x is a member of the set S” to “x is in S” or x £ S', and 
“x is not in S” to x 0 S. (Thus for instance if S = {1,3,1,2,2} then x £ S 
means that x is the number 1, or 2, or 3.) If x, y and z are all members of 5, 
we write briefly x, t/, z £ S. A singleton set contains just one element. 

If every x £ S is also in another set T, we write SC T, and say S is a 
subset of T. This includes the possibility that S = T; that is when T C S as 
well asSCT. 

Some sets have special standard symbols. The set of all natural , or 
“counting”, numbers like 1,2,3,..., 666,.. . etc. is always N (not vice versa, 
but when N means anything else this should be clear by context. Life is 
short, and the alphabet shorter.) There is no consensus whether to include 0 
in N; on the grounds of its invention several millenia after the other counting 
numbers, and certain points of convenience, we choose not to. The set of all 
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2 0. Fundamental Not(at)ions 

real (as opposed to complex) numbers like 1, — 7r, 8.2736 etc. is called 

R. The empty set 0 by definition has no members; thus if 5 = { x E N | 
x 2 = — 1} then 5 = 0. Note that 0 C N C R. (0 is a subset of any other set: 
for “0 ^ N” would mean “there is an x E 0 which is not a natural member”. 
This is false, as there is no x E 0 which is, or is not, anything: hence 0 C N.) 
Various other subsets of R have special symbols. We agree as usual that 
among real numbers 

a < b means “a is strictly less than b” or “b — a is not zero or negative” 
a < b means “a is less than or equal to 6” or “6 — a is not negative” 


(note that for any a E R, a < a). Then we define the intervals 


[a,&] = {x£R|a<z<&} 
]a,6[={xeR|a<x<&} 
[a,6[={xER|a<x<6} 
]a,&] = {xER|a<x<&} 


including ends 
not including ends 

► including one end. 


When 6 < a, the definitions imply that all of these sets equal 0; if 
a = 6, then [a, 6] = {a} = {6} and the rest are empty. By convention the 
half-unbounded intervals are written similarly: if a, 6 E R then 


]—oo,6] = {x | x < & }, [a,oo[ = { x | x > a }, 
]—oo,&[ = { x | x < &}, ]a,oo[= { x | x > a} 


by definition, without thereby allowing —oo or oo as “numbers”. We also call 
R itself an interval. (We may define the term interval itself either by gath¬ 
ering together the above definitions of all particular cases or - anticipating 
Chapter III - as a convex subset of R.) 

By a > 6, a > b we mean b < a, b < a respectively. 

A finite subset 5 = {ai,a 2 ,... ,a n } C R must have a least member, 
min 5, and a greatest, max 5. An infinite set may, but need not have extreme 
members. For example, min[0,1] = 0, max[0,1] = 1, but neither min]0,1[ nor 
max]0,1[ exists. For any t E ]0,1[, \t < t < ^(t + 1) which gives elements of 
]0,1[ strictly less and greater than t. So t can be neither a minimum nor a 
maximum. 

We shall be thinking of R far more as a geometric object, with its points 
as positions , than as algebraic with its elements as numbers. (These different 
viewpoints are represented by different names for it, as the real line or the 
real number system or field.) Its geometry, which we partly explore in VII.§4, 
has more subtlety than high school treatments lead one to realise. 
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Fig* 1*1 


If S and T are any two sets their intersection is the set (Fig. 1.1a) 

snT={xeS\xeT} 

and their union is (Fig. 1.1b) 

S U T = { x | x € S', or x E T, or both } . 

By S less T we mean the set (Fig. 1.1a) 

S\T = { x € 5 | x g 7 1 } . 

If we have an indexing set K such as {1,2,3,4} or {3, Fred, Jam} la- 
belling sets S 3 ,SFYed>Sj am ( one f° r eac h € K) we denote the resulting set 
of sets {S 3 ,SfVed,Sjam} by {Sk}keK- K may well be infinite (for instance 
K = N or K = R). The union of all of the S'* is 

Sk = { x | x E Sk for some k 6 K} 

k 

and their intersection is 

Pl Sk = { x | x G Sk for all k G K } , 

keK 

which obviously reduce to the previous definitions when k has exactly two 
members. 

To abbreviate expressions like those above, we sometimes write “for all” 
as V, “there exists” as 3, and abbreviate “such that” to “s.t.”. Then 

PI S k = { x | X € St V* G K } , U 5* = { X | 3Jfe G K s.t. x G S k } . 
keK keK 
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0. Fundamental Not(at)ions 


If S fl T = 0, S and T are disjoint ; {Sk}k£K is disjoint if Sk fl Si = 0, 
VJfc ±leK. 

When K = {1,..., n} we write 35 U£=i 5,* or Si US 2 U • • • U5 n , 

by analogy with the expression x,* = x\ + £2 +-h x n where the x,- 

are things that can be added, such as members of N, of R, or (cf. Chap. I) of 
a vector space; similarly for f)? =1 Si = S\ fl S 2 H • • • fl S n . 

We shorten “implies” to =>, “is implied by” to <=, and “=> and <=” to 
4=3- . Thus for example, 

x E N x 2 G N , xER<=xEN , 

I was married to John <=> John was married to me, or in compound use 
[xes=>xeT] «=>• [xeroxes]. 

The product of two sets X and Y is the set of ordered pairs 
X xY = {(x,y)\xeX, yeY} . 

The commonest example is the description of the Euclidean plane by Carte¬ 
sian coordinates (x, y) 6 R x R. Note the importance of the ordering: though 
{1,0} and {0,1} are the same subset of R, (1,0) and (0,1) are different e/e- 
ments of R x R (one “on the x-axis” the other “on the y-axis”). R x R is often 
written R 2 . We generally identify (R x R) x R and R x (RxR), whose elements 
are strictly of the forms ((x, y), z) and (x, (y, z)), with the set R 3 of ordered 
triples labelled (x,y, z), or (x^x^x 3 ) according to taste and convenience. 
Here the 1 , 2 , 3 on the x’s are position labels for numbers and not powers. 
Similarly for the set 

R n = R x R x •.. x R = { (x x ,x 2 ,..., x n ) | x\...,x n G R} 

of ordered n-tuples. (Note that the set R 1 of one-tuples is just R.) 

A less “flat” illustration arises from the unit circle 

-S' 1 = { (*, y) e R 2 I X 2 + y 2 = 1} . 

The product S 1 x [1,2] is a subset of R 2 x R, since S l C R 2 , [1,2] C R. 
(Fig. 1.2, with some sample points labelled.) 

S 1 is one of the n-spheres: 

S n = { (x\ ..., x” +1 ) | (x 1 ) 2 + . • • + (x” +1 ) 2 = 1 } C R n+1 . 

S 2 is the usual “unit sphere” of 3-dimensional Cartesian geometry, and 5° is 
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(( 0 , 1 ), 2 ) 



Fig. 1.2 

-O' — 


simply {—1,1} C R 1 . The higher spheres are logically no different, but take 
a little practice to “visualise”. 

A relation g on a set X is a subset of X x X. We generally abbreviate 
(x,y) € Q to x gy. Typical cases are 

{(*,y)eR 2 1 y- x is not negative } CRxR , 

this is the relation < used above, and 

{ (x, y) | x, y are people, x is married to y } , 

on the set of people. Various kinds of relation have special names; for in¬ 
stance, < is an example of an order relation. We need only define one kind 
in detail here: 

An equivalence relation ~ on X is a relation such that 

(i) x G X => x ~ x 

(ii) x ~ y => y ~ x 

(iii) x ~ y and y ^ z ^ x z. 

For example, {(x, y) | x 2 = y 2 } is an equivalence relation on a set of 
numbers, and g = { (x,y) | x has the same birthday as y } is an equiva¬ 
lence relation on the set of mammals. On the other hand, a = {(x,y) | 
x is married to the husband of y) is an equivalence relation on the set of 
wives in many cultures, but not on the set of women, by the failure of (i). 

The important feature of an equivalence relation is that it partitions X 
into equivalence classes. These are the subsets [*] = {y e x | y ~ x }, with 
the properties 

(i) x € [x], we say x is a representative of [x], 
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(ii) either [x] fl [y] = 0, or [x] = [y], 

(iii) the union of all the classes is X. 

This device is used endlessly in mathematics, from the construction of the 
integers on up. For, very often, the set of equivalence classes possesses a nicer 
structure than X itself. We construct some vector spaces with it (in II§3, 
VII§1). The example on mammals produces classes that interest astrologers, 
and o partitions wives into ... ? 


2. Functions 


A function , mapping or map f : X —+Y between the sets X and Y may be 
thought of as a rule, however specified, giving for each x EX exactly one 
y E y. Technically, it is best described as a subset / C X x Y such that 

Fi) x E X =» 3y eY s.t. ( x,y ) E / 

F ii) (a?, y), (a?, y') e f => y = y'. 

These rules say that for each x E X, (i) there is a (ii) unique y £Y that we 
may label f(x) or fx. A map may be specified simply by a list, such as 

/ : {Peter Kropotkin, Henry Crun, Balthazar Vorster} 

—> { x | x is a possible place } 

Peter Kropotkin Switzerland 

Henry Crun Balham Gas Works 

Balthazar Vorster i-* Robben Island 


An example of a function specified by a rule allowing for several possibilities 
is 

{ 1 if x E N and x > 0 

-1 if x E N and x < 0 
1 if.£N 

(Fig. 2.1a uses artistic license in representing the “zero width” gaps in the 
graph of g - which, as a subset of R x R, technically is y.) Often we shall 
specify a map by one or more formulae, for example (Fig. 2.1b,c) 


ft: R 


R : x 



if >0 
if < 0 ’ 


q : ]0, oo[ —► R : x i-» log x . 


All these satisfy F i) and F ii). Notice the way we have used —► to specify 
the sets a function is between and i-+ to specify its “rule”. (Technically, 
q : x h* log x is short for q = {(z,y) | y = logs } C ]0,oo[ x R.) This 
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Fig. 2.1 


distinction between —► and i-+ will be consistent throughout the book. We 
also read “/ : X —► Y” as the statement “/ is a function from X to Y”. 

If / : X —► Y, we call X its domain , Y its ninye, and the subset { y | 
y = /(x) for some x £ X } of Y its image , denoted by /(X) or /X. We 
generalise this last notation: if S is any subset of X, set 

fS = f(S) = { y | y = f(x) for some x E S } 

the image of S by /. Note that f({x}) = {f(x)} for any x E X, as sets. 

We are committing a slight “abuse of language” in using / to denote 
both a map X —> Y and the function 

{S|SCX}-+{r|TCY}:S.-+{y|3*eS s.t. /(*) = y } 

that it defines between sets of subsets: generally we shall insist firmly that 
the domain and range are parts of the function’s identity, just as much as 
the rule giving it, and S fS is different in all these ways from ih/i. 
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This precision about domain and range becomes crucial when we define the 
composite of two maps g : X —► Y and / : Y Z by 

f og : X -> Z : x >-* f(g(x)) ; so (/ o flr)(x) is “f of g of x” . 

If, say, we wish to compose q and h above, we have 


h o q : ]0, oo[ —► R : x 



(log*) 2 

0 


if x > 1 
if 1 > x > 0 


but qoh cannot satisfy Fi; how can we define q o h(—1), since log0 does not 
exist? Or consider s : R —► R : x sin x:- 

arER sa: < 1 => ^log(sar) < 0 when de/ined] 

=>> [log(log(sx)) never defined] 


so we cannot define qoqos anywhere . (Note that formally “differentiating” 
x log log sin a: by the rules of school calculus gives a formula that does 
define something for some values of x. What, if anything, does the rate of 
change with a: of a nowhere-defined function mean? What is the sound of 
one hand clapping?) So insisting that X and Y are “part of’ / : X —► Y is 
a vital safety measure, not pedantry. 

So we should not write down fog unless (range of g) = (domain of /). 
We may so far abuse language as to write / o g for x »-* f(gx) when 
(image of g) = (domain of /) or when (range of g) C (domain of /); this 
latter is really the triple composite / o i o g with the inclusion map 

i : (range of g) (domain of /) : x x 

quietly suppressed. Note also the amalgam of C and for inclusions. 

We sometimes want to change a function by reducing its domain; if 
/ : X —► Y and 5 Cl we define / restricted to S or the restriction of / to 
S as 

f\s : 5 Y : x »-* /(*) 

or equivalently f\ s = fo i, where i is the inclusion S *-► X. 

Notice that f\s may have a simpler expression than /: for h : R R 
as above, h|[ 0jO o[ is given simply by x i-* x 2 . It thus coincides with k |[ 0|OO [ 
where k : R —► R : x »-* x 2 , though h(x) is not the same as k(x ), (we write 
h(x) ^ k(x) for short) if x < 0. This is another reason for considering the 
domain as “part oP the function: if a change in domain can make different 
functions the same, the change is not trivial. (To regard / and f\s as the 
same function and allow them the same name would lead to “A : R —► R is the 
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same as h|[ 0 ,oo[ is the same as Jfc|[o,oo[ is the same as Jfc” which is ridiculous.) 
When we have this situation of two functions /,</ : X —► y, S' CX, and 
/Is = d\ s, we say / and g agree on S. 

A function / : X —* Y defines, besides S fS from subsets of X to 
subsets of y , a map in the other direction between subsets. It is defined for 
all T C 7 by 

r(T)={x\f(x)eT}CX , 

the inverse image of T by /. If fX fl T = 0, then f*“(T) = 0; the inverse 
image of a set outside the image of / is empty. (Likewise if fS fl T = 0, 
(f\s)*~(T) = 0.) Some images and inverse images are illustrated in Fig. 2.2. 
There / is represented as taking any x £ X to the point directly below it - 
a pictorial device we shall use constantly. 

In general /*“, a map taking subsets of Y to subsets of X, does not come 
from a map Y —► X in the way that S i-* fS does come from f : X —+ Y. 
If for every y £Y we had /*”({y}) a set containing exactly one point, as we 
have for y on the line C in Fig. 2.2 (rather than none, as to the right of C, 
or more than one, as to the left) than we can define f*~ : Y —► X by the 
condition f*~(y) = the unique member of /*~({y}); otherwise not. We can 
break this necessary condition “every contains exactly one point” 

into two, that are often useful separately: 

/ : X —► y is injective or into or an injection if for any y £ y, /*”({y}) 
contains at most one point. 

Equivalently, if f(x) = f(x') £ Y => a? = x'. 

/ : X —► y is onto or surjective or a surjection (dog latin for “throwing onto”) 
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if for any y € Y f*~({y}) / 0. This means it contains at least one point. 
Equivalently, if fX = Y (not just fX C Y, which is true by definition). 

f 'X —*Y is bijective or a btjedion if it is both injective and surjective. 

There exists a function f'~:Y-^X such that {/ <_ (y)} = /*“({y}) 
Vy € Y, if and only if / is bijective. For if there is such an /*“, each 
f*~({y}) = {/‘“(y)} # 0 since f*~ satisfies Fi, and 

/(z) = /(z') = y, say =* z,z' G /"({y}) = {/‘"(y)} 

=> z = x' since f*~ satisfies F ii 

so / is bijective. Conversely if / is bijective the subset g = { (y, z) | 
(z.y) € /} C Y x X satisfies Fi, Fii for a function Y -+ X and 
{y(y)} = /*“({y}) Vy € Y, so we can put /*“ = y. Notice that /*", when it 
exists, is also a bijection. 

(It is common to write f~ l for /-, but this habit leads to all sorts of 
confusion between /~(z) and (/(z))" 1 = l//(z), and should be stamped 
out.) 

We can state these ideas in terms of functions alone, not mentioning 
members of sets, if we define for any set X the identity map I x : X -+ X : 
z i-+ z. Now the following two statements should be obvious, otherwise the 
reader should prove them as a worthwhile exercise: 

A function / : X -* Y is injective if and only if 
3y : Y —*X s.t. gof = I x : X -» X. 

A function / : X —► Y is surjective if and only if 
By : Y —» X s.t. fog = Iy:Y—>Y. 

Neither case need involve a unique g. If X = {0,1}, Y = [0,1] then 
i : X «-*• y : z z (Fig. 2.3a) is injective with infinitely many candidates 
for y such that go f — Ix- (for instance take all of [0, f] to 0 and all of ]|, 1] 
to 1.) Similarly the unique (why?) map [0,1] —► {0} is surjective, and any 
9 '■ {0} [0,1] (say, 0 h|) has f o g = 1^ (Fig. 2.3b). But if / is bijective 

by the existence of y : Y -* X such that y o / = I x and g' :Y X such that 



Fig. 2.3 
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/ o g' = I Y then we have 

g = g°l y = g°(f °g') = (go f)o g ' = i x og ' = g '. 

By the same argument, if h : Y —► X is any other map with h o f = I x then 
it must equal g f and hence g , or with / o h = ly it must equal g. Then we 
have a unique inverse map that we may call /*“ as above, with /*“ o/ = I x , 
f o = ly . We may omit the subscript when the domain of the identity is 
plain from the context. 

When maps with various ranges and domains are around, we shall some¬ 
times gather them into a composite diagram such as 

x -Uw 

X —* W —► Z —+ Y —► T , or f[ L 

f 9 h q ± 4, 

M —► T 

G 

where the domain and range of each map are given by the beginning and 
end, respectively, of its arrow. 

This helps keep track of which compositions are legitimate. For instance, 
if / : X —► Y and g :Y —* Z are both injections, then we have two diagrams 

X —► Y —4 Z and X <— Y <— Z 

f 9 f- 9*~ 

which make clear that we can form the composites g o f and /*” o g*~ } but 
not fog (since g(y) 6 Z , and f(z) is not defined for z £ Z) or g*~ o /*“. The 
composite g o f is again a bijection, with inverse (g o /)"” = /*“ o , since 

(/*“ ° 9*~) 0 (9 0 f) = f^ 0 (9*- 0 9) 0 f = f*~°lY°f = f*~~of = I x 
(g°f)°(f~°g~) = g°(f°f’~)°g~ = g°lYog~=gog~ = i z . 

We assume the existence and familiar properties of certain common func¬ 
tions: notably 

+ :RxR-^R:(a:,|/)H3: + y , 
x :RxR-^R:(x,j/)kxj/, 

— : R —► R : x i-* — x , 

modulus : R —► R : x |a;| = { X > 

11 L —x if x < 0 

whose precise definitions involve that of R itself, and the corresponding di¬ 
vision, subtraction and polynomial functions (such asxna: 3 + x) that can 
be defined from them. When constructing examples we shall often use (as 


"PuJixl. 7^oi4e##ia£liia 



12 0. Fundamental Not(at)ions 

already above) the functions 

exp : R —► ]0, oo[: x exp(x) = e x , 

(its series is mentioned only in IX.6.2), its inverse natural logarithm 
log : ]0, oo[ —» R : x (that y s.t. e y = x) 
the trigonometrical functions 

sin : R —► R , cos : R —* R , 
and (in IX§6 only) the hyperbolic functions 

sinh : R —► R , cosh : R —► R , 

taking as given their standard properties (various identities are stated in 
Exercise IX.6.2). Among these properties we include their differentials 

$(exp)(x) = exp(:c) , ^(log)(*) = J , 

di (““)(*) = cos x . *( cos )(*) = - sin x , 

^■(sinh)(ar) = cosh a: , ^(cosh)(x) = sinhs , 

since to prove these would involve adding to the precise treatment of differ¬ 
entiation in Chap. VIII the material on infinite sums necessary to define exp, 
log, sin and cos rigorously. This seems unnecessary - when the functions are 
already familiar - for the purposes of this book. (The physics student, who 
may not have seen them precisely defined, should, if assailed by Doubt, refer 
to any elementary Analysis text, such as [Moss and Roberts].) 

Finally we define the map named after Kronecker, 

and the standard abbreviations , 6ij and 6 ,J (according to varying conve¬ 
nience) for the real number 6(i,j). Thus, for instance, 

6{ = S 22 = £ 55 = 1 , 61 = 61 = 6** = 0 . 
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3. Physical Background 

In 1887, Albert Abraham Michelson and Edward Williams Morley tried to 
measure the absolute velocity of the Earth through space, as follows. 

Light was believed to consist of movements, analogous to water or sound 
waves, of a luminiferous (= light-carrying) ether. (The name is descended 
deviously from the theories of Aristotle, in which heavenly bodies - only - 
are made of a luminous element, “ether” or “aether”, instead of terrestrial 
earth, air, fire and water. Such an element is rather unlike the 19th Century 
omnipresent something, whose only discernible property was carrying light 
by its oscillations.) Any attempt to allow currents or eddies in the ether led to 
the prediction of unobserved effects. Therefore it seemed reasonable to allow 
the ether to enjoy absolute rest, apart from its light-carrying oscillations. 
Hence an absolute velocity could be assigned to the Earth, as its velocity 
relative to the ether. Thus the crucial experiment is equivalent to measuring 
the flow of ether through the Earth. Since the ether was detectable only by 
its luminiferosity, any such measurements had to be of fight waves. 

The problem is analogous to that of measuring the speed of a river by 
timing swimmers who move at a constant speed, relative to the water, as light 
waves were believed to, relative to the ether. This constancy followed from 
the wave theory of light; Newton’s “light corpuscles” had no more reason for 
constant speed than bullets have. (In what follows, remember that “speed” 
is a number, while “velocity” is speed in a particular direction: the man who 
said he had been fined for a “velocity offence” had been driving below the 
speed limit, but down a wrong-way street.) The wave characteristics of light 
were also used essentially in the experiment; the times involved were too 
short to measure directly, but could be compared through wave interference 
effects. For the optical details we refer the reader to [Feynman], and limit 
ourselves here to the way the time comparisons were used. 

Suppose (Fig. 3.1) that we have three rigidly finked rafts moored in water 
flowing at uniform speed u, all in the same direction. The raft separations 
AB, AC are at right angles, and each of length L. If a swimmer’s speed, 
relative to the water , is always c, her time from A to C and back will be 


/ / / / 
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Fig. 3.2 


T L , L L | L _ 2vL 

upstream speed downstream speed c — v c+v c 2 — v 2 

where the speeds c — v, c + v are relative to the rafts. For swims from A to 
5, the velocities add more awkwardly (Fig. 3.2). Then she achieves a cross 
current speed of (c 2 — v 2 ) 1 / 2 giving a time from A to B and back of 

2_ ' 

Simple algebra then gives 



so that measurement of the ratio T 1 /T 2 gives v as a multiple of the “measuring 
standard” velocity c. Minor elaborations involving turning the apparatus take 
care of not knowing the current direction in advance, and the possibility that 
AB ± AC. 

The analogous experiment with c as the enormous speed of light (which 
Michelson was brilliant at measuring) and v as the relative velocity of Earth 
and ether, required great skill. Repeated attempts, ever more refined, gave 
v = 0, even when the margin of error was held well below Earth’s orbital 
speed and the experiment repeated six months later with the Earth, halfway 
round the sun, going the other way. Thus two different velocities, v and — v, 
both appeared to be zero relative to the unmoving ether! 

In retrospect, this experiment is seen as changing physics utterly (though 
it did not strike Michelson that way). More and more ad hoc hypotheses 
had to be added to conventional physics to cope with it. The Irish physicist 
Fitzgerald proposed that velocity v in any direction shrunk an object’s length 
in that direction by (1 — v 2 /c 2 ) 1 / 2 . Hendrik Antoon Lorentz suggested the 
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same (the effect is now known as the Lorentz-Fitzgerald contraction). He 
also saw that to save Newton’s law “force = mass times acceleration” the 
mass of a moving object had also to change, this time increased by the same 
factor. 

Every effort to get round these effects and find an absolute velocity hit 
a new contradiction or a similar “fudge factor”, as though there were a con¬ 
spiracy to conceal the answer. Henri Poincare pointed out that “a successful 
conspiracy is itself a law of nature”. and in 1905 Albert Einstein proposed 
the theory now called Special Relativity. He assumed that it is completely 
impossible, by any means whatever, to discover for oneself an absolutely ve¬ 
locity. Any velocity at all may be treated as “rest”. From this “Principle of 
Relativity” he deduced all the previously ad hoc fudge factors in a coherent 
way. Moreover, he accounted effectively for a wide range of experimental 
facts - both those then known, and many learnt since. His theory is now 
firmly established, in the sense that any future theories must at least include 
it as special case. For no experiment has contradicted those consequences of 
the theory that have been elaborated to date. 

One such consequence caused great surprise at the time, and leads to a 
“spacetime geometry” which - even before gravity is considered - is different 
from the “space geometries” studied up to that time. It even differs from the 
generalised (non-Euclidean and n-dimensional) ones investigated in the 19 th 
Century. By the Relativity Principle, every observer measuring the speed of 
light in vacuum must find the same answer. (Or Michelson and Morley would 
have got the results they expected.) Consider a flash of light travelling at 
uniform speed c, straight from a point X\ to a point X 2 . Then any observer 
will find the equation 

distance from X\ to X 2 
c — . . . ■ ■ ■ .- < .. — 

time taken by light flash 

exactly satisfied. But another observer may easily measure the distance dif¬ 
ferently, even on Newtonian assumptions, since “arrival” is later than “de¬ 
parture”. (A minister in a Concorde drops his champagne glass and it hits 
the floor after travelling - to him - just three feet, downwards. But he drops 
it as he booms over one taxpayer, and it breaks over another, more than 
500 ft away.) Then the Principle requires that the time taken also be mear 
sured differently, to keep the same ratio c (using, we must obviously insist, 
the same units for length, and time - otherwise one observer can change c 
however he chooses). This created controversy, above all because it implied 
that two identical systems (clocks or twins for instance) could leave the same 
point at the same moment, travel differently, and meet later after the passage 
of more time (measured by ticks or biological growth) for one than for the 
other. This contradicted previous opinion so strongly as to be miscalled the 
Clocks, or Twins, Paradox; cf. IX.4.05. (Strictly a paradox must be self - 
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contradictory, like the logical difficulties that were troubling mathematics at 
the time. Physics has had its share of paradoxes, such as finite quantities 
proved infinite, but this is not one of them. There is nothing logically wrong 
about contradicting authority, whether Church, State or Received Opinion, 
though it may be found morally objectionable.) With the techniques for 
producing very high speeds developed since 1905 - near lightspeed in parti¬ 
cle accelerators - and atomic clocks of extreme accuracy, this dependence of 
elapsed time on the measurer has been confirmed to many decimal places in 
innumerable experiments of very various kinds. We consider its geometrical 
aspects in Chap. IX (since it is a failure of geometric rather than physical 
insight that gives the feeling of “paradox”) and its more physical, quantita¬ 
tive aspects in Chap. XI. The following remarks explain some terminology 
chosen in Chap. IV. 

Choose, for our first observer of the above movement of a light flash, 
coordinates («, y, z) for space and t for time with t = x = y = z = 0 labelling 
“departure”. (We choose rectangular coordinates (x,y, z) if we can, though 
this is usually only locally and approximately possible in the General theory. 
The discussion below then leads to the structure we attribute to spacetime 
“in the limit of smallness” where the approximations disappear, so it remains 
satisfactory for motivation.) In these coordinates, “arrival” is labelled by 
the four numbers (t,x,y, z). Then equation * becomes, using Pythagoras’s 

theorem, _ 

_ y/x 2 + y 2 + z 2 
C ~ t 


or equivalently 


c 2 t 2 


x 2 - y 2 — z 2 = 0 


The Principle requires this to be equally true for an observer using different 
coordinates with the same origin, “departure” labelled by (0,0,0,0), but 
giving a new label (t f yX^y'^z') to “arrival”. As remarked above, t f will in 
general be different from t. But we must still have 


c 2 ( 0 2 - (*') 2 - {y'f - (*') 2 = o 


with the same value of c. It follows fairly easily (the more mathematical 
reader should prove it) that there is a positive number S such that for any 
“time and position” labelled (T,X,y,Z) by one system and (T' ,X' ,y 7 ,Z') 
by the other, not just the possible “arrival” points of light flashes with “de¬ 
parture” (0,0,0,0), we have 

c 2 ( t ') 2 - C x ') 2 - (y ') 2 - ( z ') 2 = s(c 2 r 2 - x 2 - y 2 - z 2 ). 

Now the Principle requires that both systems use the same units; in particular 
they must give lengths 
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v^x') 2 + (Y') 2 + (£') 2 = \/x 2 + y 2 + z 2 

to points for which they agree are at time zero, so that T' = T = 0. There 
always will be such points (a proof needs some machinery from Chapters I- 
IV), so S can be only 1. Up to choice of unit, then, 

c 2 T 2 _ x 2 -Y 2 -Z 2 

is a quantity which, unlike T,X,y,Z individually, does not depend on the 
labelling system. This is in close analogy to the familiar fact of three- 
dimensional analytic geometry, used above, that 

x 2 + y 2 + z 2 = (distance from origin) 2 

does not depend on the rectangular axes chosen. It is common nowadays 
to strengthen the analogy by choosing units to make c = 1. For instance, 
measuring time in years and distance in light years, the speed of light becomes 
exactly 1 light year per year by definition. Or as in [Misner, Thorne and 
Wheeler], time may be measured in centimeters - in multiples of the time 
in which light travels 1 cm in vacuum. (Such mingling of space and time 
is ancient in English, though “a length of time” is untranslatable into some 
languages, but is only fully consummated in Relativity.) This practice gives 
the above quantity the standard form 

T 2 - X 2 - Y 2 - Z 2 

independently even of units (though its value at a point will depend on 
whether your scale derives from the year or from the Pyramid Inch.) We 
examine the geometry of spaces with label-independent quantities like this 
one and like Euclidean length, from Chap. IV onwards. 

Two ironies: Michelson lived to 1933 without ever accepting Relativity. 
Modern astronomers, who accept it almost completely, expect in the next 
decade or so to measure something very like an “absolute velocity” for the 
Earth. This derives from the Doppler shift (cf. XI.2.09) of the amazingly 
isotropic universal background of cosmic black body radiation. 
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I. Real Vector Spaces 


“To banish reality is to sink deeper into the real; 
allegiance to the void implies denial of its voidness.” 

Seng-ts’an 


1. Spaces 


1.01. Definition. A real vector space is a non-empty set X of things we call 
vectors and two functions 

‘Vector addition” : X x X -+ X : (x,y) x + y 
“scalar multiplication” :IxR-^X : (x,a)»-+ xa 


such that for y, z E X and a, 6 E R we have 

(i) x 4- y = y -f x, (commutativity of +). 

(ii) (x + y) + z = x + (y + z), (associativity of +). 

(iii) There exists a unique element 0 E X, the zero vector , such that for 
any x E X we have x + 0 = x, (+ has an identity). 

(iv) For any x E X, there exists (—*) E X such that x + (—*) = 0, 
(+ admits inverses). 

(v) For any x E X, xl = *. 

(vi) (x + y)a = xa ■ 


(vii) x(a + b) = xa + 
(viii) ( xa)b = x(ab). 


^ | (distributivity). 


This long list of axioms does not mean that a vector space is immensely 
complicated. Each one of them, properly considered, is a rule that something 
difficult should not happen. English breaks (i), since 


killer rat ^ rat killer 

and similarly (ii), since 

killer of young rats = (young rat)killer 

^ young(rat killer) = young killer of rats . 

In consequence the objects that obey all of them are beautifully simple, and 
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the theory of them is the most perfect and complete in all of mathematics 
(particularly for “finite-dimensional” ones, which we come to in a moment 
in 1.09). The theory of objects obeying only some of these rules is very much 
harder. English, which obeys none of them, is only beginning to acquire a 
formal theory. 

If a vector space is “finite-dimensional” it may be thought of sim¬ 
ply and effectively as a geometrical rather than an algebraic object; the 
vectors are “directed distances” from a point 0 called the origin , vector 
addition is defined by the parallelogram rule and scalar multiplication by 
xa = “(length of x) x a in the direction of *”. All of linear algebra (alias, 
sometimes, “matrix theory”) is just a way of getting a grip with the aid of 
numbers on this geometrical object. We shall thus talk of geometrical vectors 
as line segments: they all have one end at 0, and we shall always draw them 
with an arrowhead on the other. To forget the geometry and stop drawing 
pictures is voluntarily to create enormous problems for yourself - equal and 
opposite to the difficulties the Greeks had in working with raw geometry 
alone, with no use of coordinates at all. (Often other pictures than arrows 
will be appropriate, as with vectors in the dual space discussed in Chapter III. 
But reasoning motivated by the arrow pictures, within any particular vector 
space, remains useful.) 



In this context, the real numbers used are called scalars. The only reason 
for not calling them just “numbers”, which would adequately distinguish 
them from vectors, is that for historical reasons nobody else does, and in 
mathematics as in other languages the idea is to be understood. 

The term real vector space refers to our use of R as the source of scalars. 

We shall use no others (and so henceforth we banish the “real” from the 
name), but other number systems can replace it: for instance, in quantum 
mechanics vector spaces with complex scalars are important. We recall that 
R is algebraically a field (cf. Exercise 10). 

Notice that properties (ii), (iii) and (iv) are sufficient axioms for a vector 
space to be a group under addition; property (i) implies that this group is 
commutative (cf. Exercise 10). 

1.02. Definition. The standard real n-space R n is the vector space consist¬ 
ing of ordered n-tuples (x 1 ,... ,x n ) of real numbers as on p. , with its <= 
operations defined by 
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(a: 1 ,.. .,x n ) + (y 1 ,... ,y") = (a: 1 + y\... ,x" + y n ) 

(x 1 ,..., x")a = (ax 1 ,... ,ax n ) 

(cf. Exercise 1). 

1.03. Definition. A subspace of a vector space X is non-empty subset 
SCX such that 

x,yes=>(x + y)es 
xes } aeR^xaes. 

For instance, in a three-dimensional geometrical picture, the only sub¬ 
spaces are the following. 

(1) The directed distances from origin 0 to points in a line through 0, 
(a line subspace). 

(2) The directed distances from origin 0 to points in a plane through 
0, (a plane subspace). 

(3) The trivial subspaces: the whole space itself, and the zero subspace 
(consisting of the zero vector 0 alone). By Exercise 2a this is con¬ 
tained in every other subspace. 

Sets of directed distances to lines and planes not through 0 are examples 
of subsets which are not subspaces, (cf. Exercise 2). Nor are sets like S 
(cf. Fig. 1.2). 



1.04. Definition. The linear hull of any set S C X is the intersection of all 
the subspaces containing S. It is always a subspace of X (cf. Exercise 3a). 
We shall also say that it is the subspace spanned by the vectors in S. 

Thus, for instance, the linear hull of a single vector is the intersection of 
all line subspaces and plane subspaces etc. that contain it, which is clearly 
just the line subspace in the direction of the vector. Similarly the linear hull 
of two non-zero vectors is the plane subspace they define as two line segments 
(Fig. 1.3) unless they are in the same or precisely opposite direction, in which 
case it is the line subspace in that direction. The linear hull of three vectors 
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Fig. 1.3 


may be three-dimensional, a plane subspace, a line subspace or (if they are 
all zero) the zero subspace. 

1.05. Definition. A linear combination of vectors in a set S C X is a finite 

sum a 1 + x 2 a 2 +-1- x n a n , where *i,... ,* n £ S and a 1 ,... ,a n E R. 

(cf. Exercise 3b) 



1.06. Notation. The summation convention (invented by Einstein) repre¬ 
sents x\a 1 -|-h x n a n by ajja 1 , and in this book x ,a* will always represent 

such a sum. (Be warned: this is mainly a physicist’s habit. Mathematicians 
mostly use x *' a * f° r sums > an d by would mean xia 1 , or x 2 a 2 , or 
x n a n . There are good arguments for either, and if you go further you will 
meet both. We shall always favour physicist’s notation in this book, unless 
it is hopelessly destructive of clarity.) Evidently Xja 3 or * a a a represent the 
same sum equally well, as long as we know what x \ ) ..., x n and a 1 ,..., a n 
are, and so i, j, a etc. are often called dummy indices , to emphasise that while 
X{ need not be the same vector as Xj, *,-a f is always the same as XjO ?. It is 
often convenient to “change dummy index” in the middle of a computation - 
this makes use without explicit mention of the identity x ,-a* = Xja 3 . 

The convention does not apply only to writing down linear combinations. 
For example, if we have real-valued functions f \,..., f n and g 1 ,..., g n , then 
fig 1 is short for fig 1 H-h f n g n . (This expression will emerge in later chap¬ 

ters as the value of a covariant vector field applied to a contravariant one - 
we have not forsaken geometry.) Invariably, even if there are a lot of other 
indices around, a'Jfby for instance will mean a***&i f H- w h ere n 
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is (hopefully) clear from the context. The Kronecker function (cf. Chap. 1.2) 
often crops up with the summation convention, as xty = Xj in change of 
index for example; beware however, 6\ = n. 

Notice that the convention applies only if we have one upper and lower 
index (though a*6* means the same as 6‘a,- - order does not signify); if we 

want to abbreviate x l y l + x 2 y 2 +-h x n y n we must use This 

is not as daft as it seems. The position of the indices usually have geomet¬ 
rical significance, and the two kinds of sum then represent quite different 
geometrical ideas: Chapter III is about one, Chapter IV about the other. 

1.07. Definition. A subset S C X is linearly dependent if some vector in S 
is a linear combination of other vectors in S. (Notice that 0 is always a linear 
combination of any other vectors, for 0 = xO + yO, so any set containing 0 is 
linearly dependent if we say for tidiness that {0} is linearly dependent too.) 
Equivalently (Exercise 4), S is linearly dependent if and only if there is a 
linear combination x .a* = 0 of vectors E S', with not all the a* =0. If S 
is not linearly dependent, it is linearly independent 

Geometrical example: a set of three vectors in the same plane through 
the origin is always linearly dependent. To have three independent directions 
(only the directions of the vectors in S matter for dependence, not their 
lengths; why?) we need more room. This leads us to 

1.08. Definition. A subset /? C X is a basis for A if the linear hull of /? is 
all of A, and /? is linearly independent. 

Intuitively, it is clear that a basis for a line subspace must have exactly 
one member, whereas a plane subspace requires vectors in two directions, and 
so forth; the number of independent vectors you can get, and the number you 
need to span the space, will correspond to the “dimension” of the space. Now 
our concept of dimension does not rely on linear algebra. It is much older and 
more fundamental. What we must check, then, is not so much that our ideas 
of dimension are right as that linear algebra models nicely our geometrical 
intuition. The algebraic proof of the geometrically visible statement that if 
X has a basis consisting of a set of n vectors, any other basis also contains n 
vectors, is indicated in Exercises 5-7. (The same sort of proof goes through 
for infinite dimensions, but we shall stick to finite ones.) Hence we can define 
dimension algebraically, which is a great deal easier than making precise 
within geometry the “concept of dimension” we have just been so free with. 
But remember that this is an algebraic convenience for handling a geometrical 
idea. 

1.09. Definition. If X has a basis with a finite number n of vectors, then 
X is finite-dimensional and in particular n-dimensional Thus R 3 is 3- 
dimensional, by Exercise 9. The number n is the dimension of X. We 


Oix*. 7^aiAe##ia£liia L PAylicJ. 



1. Spaces 


23 


shall assume all vector spaces we mention to be finite-dimensional unless we 
specifically indicate that they are not. If a subspace of X has dimension 
(dimX — 1) it is called a hyperplane of X by analogy with plane subspaces 
of R 3 . (What are the hyperplanes of R 2 ?) 

1.10. Definition. The standard basis £ for R n (cf. Definition 1.02) is the set 
of n vectors ei, ..., e n where e, = (0 ,..., 0, 1 , 0,. .., 0) with the 1 in the 
i-th place, (cf. Exercise 9) 


Exercises 1.1 

1. The standard real n-space is indeed a vector space. 

2. a) Any subspace of a vector space must include the zero 0. 

b) The set {0} C X is always a subspace of X. 

3. a) The linear hull of S C X is a subspace of A. 

b) The linear hull of S C X is exactly the set of all linear combinations 
of vectors in S. 

c) A subset 5 is a subspace of X if and only if it coincides with its linear 
hull. 

4. Prove the equivalence of the alternative definitions given in 1.07. 

5. A subset of a linearly independent set of vectors is also independent. 

6. If f3 is a basis for X then no subset of f3 (other than (3 itself) is also a 
basis for X. 

7. a) If ft = {*i,..., x n } and (3 l = {yi,..., y m } are bases for X then so is 

{yi, *i,... ... ,x n } for some omitted Xj. Notice that the 

new basis, like /?, has n members. 

b) Prove that if ifc < n, then a set consisting of yi,...,y* and some 
suitable set of (n — k) of the aj,-’s is a basis for X. Deduce that m <n. 

c) Prove that m = n. 

8. If (3 is a basis for X then any vector in X is, in a unique way, a linear 
combination of vectors in (3. If therefore, 

Xi a l = y = x'jV , 

where the £ /? then the non-zero a 4 ’s are equal to non-zero s 

and multiply the same vectors. They are called the components of y 
with respect to /?. 

9. Prove that {e \,..., e n } is a basis for R n . 

10. A group (A,*) is a non-empty set X and a map * : X x X X 
such that * is associative, has an identity, and admits inverses. Thus 
(R, +) and (R \ {0}, x) are groups and in fact this double group struc¬ 
ture makes the real numbers a field because + and x interact in a 
distributive way. 
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2. Maps 

In almost all mathematical theories we have two basic tools: sets with a 
particular kind of structure, and functions between them that respect their 
structure. We shall meet several examples of this in the course of the book. 
For sets with a vector space structure, the functions we want are as follows. 

2.01. Definition. A function A : X —► Y is linear if for all x,y G X and 
a G R we have 

A(x + y) = Ax + Ay 
A(xa) = ( Ax)a . 

The terms linear map or mapping, linear transformation and linear operator 
for such functions are frequent, though the latter is generally reserved for 
maps A : X —► X, which “operate” on X. (It is also the favourite term in 
books which discuss, for example, quantum mechanics in terms of operators 
without ever saying what they operate on. This is perhaps intended to make 
things easier.) We shall use “linear map” for a general linear function X —* Y, 
“linear operator ” in the case X X, omitting “linear” like “real” where no 
confusion is created. 

The set L(X;Y) of all linear maps X —► Y itself forms a vector space 
under the addition and scalar multiplication 

(A : X -► Y) + {B : X -+Y) = A + B : X ->Y : x ^ Ax + Bx 
(A : X —► Y)a = Aa : X —► Y : x (A*)a 

as is easily checked. So is the fact that the composite BA of linear maps 
A : X —>Y, B :Y -+ Z is again linear. We show in 2.07 that dim L(X\Y) = 
dim A • dim Y. 

2.02. Definition. The identity operator lx on X is defined by Ix(&) = 
for all x. We shall denote it by just I when it is clear which space is involved. 
A scalar operator is defined for every a G R by ( Ia)x = xa for all x G X. 
Such an operator is abbreviated to a, so that xa — ax. 

The zero map 0 : X —► Y is defined by 0* = 0. 

A linear map A : X —► Y is an isomorphism if there is a map B : Y —» X 
such that both AB = 7y and BA = lx. (Notice that it is possible to have 
one but not the other: if A : R 2 R 3 : (x, y) i-+ (x, y, 0) and B : R 3 —► R 2 : 
(x,y, z) h-* (x,y), then BA = I R 2 but AB ^ I R 3 .) We read A : X = Y as 
“A is an isomorphism from X to Y”. 

Such a B is the inverse of A and we write B = A*". A is then invertible. 
A : X —► Y is non-singular if A* = 0 implies x = 0, otherwise singular. 
(cf. Exercise 1) Evidently an invertible map is non-singular. 

If x ^ 0, Ax = 0 then x is a singular vector of A. 
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2.03. Lemma. A function A : X —► Y is an isomorphism if and only if it is 
a linear bijection. 

Proof If A is a bijection, there exists B : Y —► X (not necessarily linear) 
such that AB = ly, BA = lx- If A is also linear, consider y,y' £ Y. For 
some x, x' € X we have y = Ax, y ' = Ax', since A is surjective, and y+y' = 
Ax+Ax' = A(x+x'), (A linear). So B(y+y f ) = BA(x+*') = J(x+x') = 
a; + x' = (BA)x + (BA)x' = J5(Ax) + jB(Ax') = By + By*. Similarly 
JB(ya) = (J9y)a, and hence B is linear. Conversely, an isomorphism is linear 
by definition and a bijection by the existence of its inverse. □ 

2.04. Corollary. If A is non-singular and surjective, it is an isomorphism. 

Proof Non-singularity implies that A is injective by Exercise 1, and hence 
bijective. The result follows. □ 

2.05. Lemma. If (3 is a basis for X, then (i) any linear map A : X Y 
is completely specified by its value on f3, and (ii) any function A : /? —► Y 
extends uniquely to a linear map A:X—>Y. 

Proof Let x £ X be the linear combination 6,-a* of elements of /?. By 
linearity of A, Ax = A(6,*a t ) = ( Abi)a\ which depends only on x (via the 
scalars a*) and the vectors A6 t *. Thus A is fixed if we know its values on /?, 
and since x = bid* in a unique way (Exercise 1.8), we can without ambiguity 
define A by Ax = (A6,)a‘, and check that A so defined is linear. □ 




Geometrically: think of a parallelogram or parallelepiped linkage at¬ 
tached to the origin. Move the basic vectors x,y,z around, and their sums 
(given by the parallelogram law) are forced to more in a corresponding way. 
If not only the parallelogram law but also scalar multiplication is to be pre¬ 
served, it is clear that defining an operation on basic vectors is enough to 
determine it everywhere. 

2.06. Corollary. If X is an n-dimensional vector space, there is an isomor¬ 
phism A : X —► R n . 
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Proof. Pick any basis { 61 ,..., b n } for X, then define 

{hi > • • • j {^i,..., Cfj} . bi i—\ €i . 

B 

The functions A and B extend to A and B between X and R n , and if sc Gl, 
y G R n we have 

BAx = BA( 6 i fl i ) 

= BdAbiW) 

= B(( e< )a*') 

= (J5e,)a* 

= fe,a’ 

= * 

so that BA = Jx, and similarly AB = Jr*. □ 

2.07. Matrices. By the last lemma any finite-dimensional space is a copy 
of R n - so why not just use R n , instead of all this stuff about vector spaces? 
The reason is that to get the isomorphism A you had to choose a basis, and 
an ordering for it. Once you have done that, you have “chosen coordinates” 
on X, because you can label a vector by its image Ax = (a 1 ,... ,a n ). (In 
the presence of a basis we shall use such labels quite often, sometimes abbre¬ 
viating them to a single representative a*.) But there may be no particular 
reason for choosing any one ordered basis (as in interplanetary space, for 
instance) or - worse - good reasons for several different ones. Moreover, it is 
often easier to see what is going on if a basis is not brought in. However for 
specific computations a basis is usually essential, so the best approach is to 
work with a general vector space and bring in or change a basis as and when 
convenient. 

A basis enables us to write down vectors in an n-dimensional vector 
space X conveniently as n-tuples of numbers, and to specify a map A to 
an n-dimensional space Y by what it does to just the set of n basis vectors 
{ 6 i,... , 6 n }. This involves giving an ordered list of the n vectors A(bj) = 

aa'j = ciaj +-h Cmdj 1 = (aj,... ,aj*) “in coordinates” according to an 

ordered basis ci,..., c m for Y. Given this, we know that for a general x = 
bjaf = (a: 1 ,..., x n ) “in coordinates” we have 

Ax = A(bj x 1 ) = (Abj)x 3 = (a),a^x 3 = (aja^,... ,aJV) . 

Thus A is specified in this choice of coordinates by the mn numbers aj. It 
is convenient to lay these out in the m by n rectangle, or matrix 
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“ a 
a 


1 

1 

2 
1 


La 


m 

1 


,1 -i 


[«}], M, or A for short. 


a 




If we have not already labelled the entries of, say, [J3], and want to refer 
to its entry in the i-th row and j-th column we shall call it [2?]}. Notice 
that the columns here are just the vectors A(6 ; ), written in “ci,...,c m 
coordinates”. If in a similar way the vector x = (ar 1 ,..., ar n ) in “&i,..., b n 


coordinates” is written as a column matrix 


, then the rule for finding 


U”J 

Ax in coordinates is exactly the rule for “matrix multiplication” (cf. also 
Exercise 2). By 2.04, once we have chosen bases every map A has a matrix 
A and every matrix defines a map. 

If we define, in terms of these bases for X and Y, the mn linear maps 
L\ such that 


L\ (x 1 b 1 +-h x n b n ) = x'cj 

with the matrix for L\ being 

TO ... 0 ... 01 


0 

0 ... 0 1 0 ... 0 
0 


<— j -th row 


0 ... 0 ... 0 
T 

i-th column 


we get a basis for L(X\Y) since 


A = a)L{ 

using the usual addition for maps (2.01; cf. also Exercise 3). 

Thus the aj are just the components of A, considered as a vector in 
the mn-dimensional space L(X;Y), with respect to the basis induced by 
those chosen for X and Y. Notice that we have proved, for general finite¬ 
dimensional X and Y, that 

dim(L(X; Y)) = dimX • dim Y . 
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The identity on X , regarded as a map from “X with basis /?” to “X 
with basis /?” must always have the matrix 

“1 0 ... 0 
0 1 

.0 1 

whatever /? is. All that is involved is the use of the same basis at each “end” 
of the identity map. This matrix is therefore called the identity map. 

Notice that the matrix representing a map from X to Y depends on 
the particular basis chosen for each. If several bases have got involved, it is 
sometimes useful to label a matrix representation according to the particular 
bases we are using. Thus we write the matrix for A, via bases /?, /?' for 
X ) Y respectively, as [o>)]p . Then, if we have the matrix [b r s ]p t , similarly 
representing B : Y —► Z, the representations fit nicely and we have BA 
represented by [b r 8 aj]p = [b r s ]pt [a)]p , with the basis /?' “summed over” and 
vanishing in the final result like the numbers s or i it is indexed by. If two 
different bases for Y are involved in defining the two matrices we can still 
algebraically “multiply” them but we cannot expect it to mean very much. 
For this and other purposes, we need to be able to change basis. 

2.08. Change of Bases. If we have bases /?, /?' for X ) changing from /? to 
/ 3 ' involves simply looking at the identity map I : X —> X as a map from 
“X with basis /?” (call it (X,0) for short) to (X,/?'). This we can represent, 
just as in the last section, by [I\p . That is a matrix whose columns are 
the vector J(fc,), for 6,* E /?, written in ^'-coordinates. But as /(&,) = 6,*, 
this just means the coordinates of the vectors in /? in terms of the basis /?'. 
Multiplying the column matrix [x]P, representing x according to /?, by the 

at 

n x n matrix [I\ P p gives the column matrix representing * according to /?': 

[<[*]' = [*F'- 

There is a sneaky point here: most often when changing bases you are 
given the new basis vectors, /?', in terms of the old basis /?, rather than the 
other way about. Putting these n-tuples of numbers straight in as columns 
of a matrix gives you not the matrix [I]p of the change you want, but the 

matrix for changing back. To get [I]p you need to find the inverse of 
[I]p, , since clearly 

= m? = [# = [< = • 

(Fortunately this inversion is one computation we shall not need to do ex- 
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plicitly; we shall just denote the inverse of any [a*], if it has one, by [a-], 

“defining the a^’s as the solutions of the n equations a\a\ = 6 k ”. This is 
a physicist’s habit, except for then’s we have put in. In may physics books 
and articles you have to remember that “a*- when j = 2, i = 3” is not 
the same as u a\ when j = 3 and i = 2”. If you find that peculiar, put 
the~’s in yourself.) This need for the inverse is somewhat unexpected when 
first met and you should get as clear as you can where it comes from. The 
nineteenth century workers were really bogged down in it, in the absence 
of the right pictures. The worst pieces of language we are stuck with in 
tensor analysis started right there. We discuss this further in III.1.07 and 
VII.4.04. 

It is important to be conscious that although matrix multiplication gen¬ 
eralises the ordinary kind, each entry a] of [aj] 4 ” depends on the whole ma¬ 
trix [aj]. It is not just the multiplicative inverse (a }) -1 (as is emphasised by 
Exercise 7). This point does not seem deep when we are discussing only linear 
algebra, but in the differential calculus of several variables it has sometimes 
caused real confusion (see VII.4.04(2)). 

So, [I]p changes the representation of a vector. To change that of an 
operator, so as to apply it to vectors given in terms of /?', just change the 
vectors to the old coordinates, operate, and change back: 

[A%, = [<[A]«[7]g, 

or equivalently a 1 - = b\a\b 1 -, where b\6fb l j = 

When two matrices are related by an equation of this kind, P = RQR*~ 
for some invertible matrix iZ, they are called similar. Thus we have shown 
that the matrices representing a map according to different bases are similar. 
Conversely, any pair of similar matrices can be obtained as representations 
of the same map (Exercise 6 ), so the two concepts correspond precisely. 

2.09. Definition. The kernel ker A of A : X —+• Y is the subspace { * E X | 
Ax = 0 }, of singular vectors of A. Note that by Exercise 1 (an easy but 
very important exercise), A is injective if and only if ker A = {0}. 

The image AX of A is the subspace { y E Y \ y = Ax for some x E X }. 
(cf. Exercise 4) 

The nullity n(A) of A is dim(ker A), the dimension of the kernel. 

The rank r(A) of A is dim(AX), the dimension of the image. 

Geometrically, in the case A : R 2 —► R 2 : (x, y) \-+ (2(x — y), {x — y)), for 
example, see Fig 2.2. 

The image and the kernel are as shown, and the rank and the nullity 
each 1. This illustrates a general proposition; the number of directions you 
squash flat, plus the number of directions you are left pointing in, is the 
number of directions you started with. More formally: 
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Fig , 2.2 


2 .10. Theorem. For any finite-dimensional vector space X and linear map 
A : X —► Y, we have 

n(A) + r(A) = dim X . 

Proof. An exercise in shuffling bases, and left as such. (Exercise 5) □ 

2 .11. Corollary. A linear map A : X —► Y is non-singular if and only if 

r(A) = dim X. □ 

2 .12. Corollary. An operator A : X —► X is non-singular if and only if A 

is an isomorphism. □ 

2.13. Corollary. Suppose dim A = dim Y, and A: X —► Y is linear. Then 

A is an isomorphism if and only if it is injective 

and 


A is an isomorphism if and only if it is surjective. 


□ 


Exercises 1.2 

1. A linear map A : X —► Y is non-singular if and only if A is injective 
(if Ax = Ax' what is A(x — V)?). 

2. If, with bases chosen for X , Y, Z we have maps A : X —► Y and 
B : Y —► Z represented by matrices [aj], [b r s ], then their composite 
BA : X —► Z is represented by the matrix [tjaf], 

3. If, with bases chosen for X and Y the maps A, B from X to Y have 
matrices [aj], [&}•] respectively, then the matrix of A + B : X —► Y : 
x A* + Bx is [aj + 6 J], 

4. a) The kernel of a linear map X —► Y is a subspace (not just a subset) 

of A. 

b) The image of a linear map X —► Y is a subspace of Y. 

5. If fi = {&i,..., 6 „} is a basis for X, u = {di,..., d*} is a basis for 
X f C X and A : X —► Y is a linear map, then the following hold. 
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a) There is a basis {d\ y ...,d n } for X including all the vectors in w 
(cf. Exercise 1.7); 

b) the vectors Adi,..., Ad n span the image of A; 

c) if y is in the image of A, as the image of both the vectors x and 

in X (so y — Ax = A*'), then x' = x + z, where z is in the kernel 
of A. 

d) If ker A = X f , deduce from (b) that the vectors Ad*+i,..., Adn span 
the image of A, and thence and from (c) that they are a basis for it. 

e) Deduce Theorem 2.10. 


6 . a) Deduce from 5(d) that if f3 = {&i,..., 6„} is a basis for X , and A : 
X —► Y is an isomorphism, then A/? = {Afci,..., A6 n } is a basis 
for Y. 

b) If /? is a basis for X and A : X —+ X is an isomorphism, the change 
of basis matrix [I]f^ is exactly the matrix ([A]j^)"~. 

c) Hence, if matrices P, Q are similar by P = AQA*~ } and P,Q, A are 
the maps X —► X defined by P, Q , A via the basis /?, then Q represents 
P in the basis A/?. 


7. 


Defining A = 
AC = CA= \ 




n f 


-1 2 ' 

, B = 


2 

l l 

, C = 

1 -1 


'3 

; 2-1 


‘a 2*1 

AB = 

_2 

. 1 
' l 2 i 

II 

DQ 

2 2 

2 3 


show that 


3. Operators 

Operators (linear maps from a vector space to itself) have a very special role. 
Among the definitions involving only this special class of maps are 

3.01. Definition. An operator on X which is an isomorphism is called an 
automorphism. The set GL(X) of all automorphisms of X form a (Lie) 
group, the general linear group of A, under composition (cf. Exercise 1.10). 
(Not under addition; I + (-1) = 0, which is not an automorphism.) 

3.02. Definition. An operator A : X —► X is idempotent if AA = A. 
Essentially, this means that A is projecting X onto a subspace, as in the 



Pate Tfeat/tematZcn L "Ph-tyA-LC-A 




32 


I. Real Vector Spaces 


figure (where the ~*’s indicate the movement under A of a sample of vectors), 
and a vector having arrived in the subspace L is then left alone by further 
application of A. Hence we shall often call A a projection onto A(X). An 
important class of such operators will concern us in Chapter IV. 

3.03. Definition. A vector x / 0 is an eigenvector of A : X —► X if Ax = 
x\ for some scalar A. Then A is an eigenvalue of A, and x is an eigenvector 
belonging to A. The set of eigenvectors belonging to A, together with 0, is a 
subspace of X (easily checked), the eigenspace belonging to A. 

(Eigenvectors are sometimes called characteristic vectors, and corre¬ 
spondingly eigenvalues are called characteristic roots or values. This conveys 
the feel of the German “eigen-” but is more cumbersome and less sonorous. 
However, ...values are almost always denoted by A, just as unknowns are 
by x and beautiful Russian spies by Olga.) 

We have already met one example; ker A is the eigenspace belonging 
to 0. Another is familiar; a rotation in three dimensions must leave some 
direction - the axis of rotation - fixed, and so we have eigenvectors in that 
direction belonging to the eigenvalue 1. If A is the identity, then the whole 
of X belongs to the eigenvalue 1. Reflection in the line x = y is 

A : R 2 -*• R 2 : (x,y) i-> ( y,x ) 
in this case we have eigenvalues ±1. 




eigenspace 
belonging to-I 


eigenspace 
belonging to+1 


Fig. 3.2 


3.04. Definition. For L(X] X) we have not only addition and scalar multi¬ 
plication as for L(X;Y) but a “multiplication” defined by composition. For 
any operators A, B their composites AB and BA are again operators on X. 
The operator algebra of X is the set L(X ; X) with these three operations. 
This is an “algebra with identity”: for all A,B,C 6 L(X]X), a 6 R we have 


A(BC) = ( AB)C 
A(B + C) = AB + AC 
(A + B)C = AC-M5C 


(associativity of composition) 


| (distributivity) 
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a(AB) = ( aA)B = A(aB) 

AI = A = JA (composition has an identity) 

as for numbers. Unlike that of numbers, this multiplication is not commu¬ 
tative ( AB ^ BA in general). Either multiply jj ® and ® jj both 

ways round, if you are best convinced by algebra, or wave your hands in the 
air: If A is “rotation through 90° about a vertical axis”, and B is “rotation 
through 90° about a northward axis”, both clockwise, experiments with your 
elbow as origin will show that AB ^ BA. 

There are two important functions from L(X]X) to R, one preserving 
multiplication and the other addition; the determinant and the trace. 

3.05. Determinants. The determinant function may be regarded in sev¬ 
eral ways. Algebraically, one may start with either matrices or linear maps. 
We shall give here a geometric account of it, with the matrix proof of its 
properties (the least instructive but most direct) indicated in the exercises. 
(Manipulations of this kind are unilluminating to see, but essential practice.) 
In Exercise V.1.11 it emerges from some rather more sophisticated algebra, 
which corresponds more closely to the geometry below and amounts to a 
rigorous version of the same ideas. 

Consider the map A : R 2 —► R 2 with matrix ^ ^ in the standard 

basis, and examine its effect in the unit square (Fig. 3.3). The area of the 
unit square is 1; the area of the parallelogram to which it is taken may be 
found, for instance, by adding and subtracting rectangles and right-angled 
triangles. Now any other shape may be approximated by squares. The area 
of these squares is evidently changed by A in the same proportion as the 
unit square, so taking a high-handed Ancient Greek attitude to limits it is 
clear that the area of any figure is multiplied by the same quantity (ad — be ), 
which we shall call det A. 

Thus “what A does to area” is to multiply it by det A, which quantity 
therefore, although given as (ad — be) in terms of the entries for a matrix 
for A, does not depend as those entries do on the particular basis chosen 

Area of A 

|| gram =£ ^J| 

*= (^cd+ad+^ab) 

-(^ab+cb+^cd) 

= ad - be. 

Fig. 3.3 
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Fig. 3.4 


for R 2 . Conveniently, not only the number det A but the formui 
independent or the basis. For any bases at all, if [A] = 




a for it are 
then 


det A = a}a 2 — a\a\ 


(Equation D2) 


(This is a manipulative algebraic fact, and as such left to the exercises.) It 
will often be useful to write the determinant of a matrix A = 

«i “2 

a\ a\ ’ 

The alert reader may have noticed that we have sneakily assumed that 
“area” is well-defined, which for R 2 is true, but how about an arbitrary 
two dimensional space? In fact, more than one measure of “area” is possi¬ 
ble, but the “multilinear” ones appropriate to a vector space are all scalar 
multiplies one of another (Exercise V.1.11), so “what A does to area” is in¬ 
dependent of which measure we pick - they are all multiplied in the same 
proportion. 

In a similar fashion, if X is 3-dimensional “what A does to volume” is 
naturally independent of basis, and is represented by the equally invariant 
formula 


the determinant of any map A represents) as 


or 


det A — a\ 


a\ a i 

„1 

a\ a\ 

| n 1 

a l a 2 


“ a 2 

a l a 3 

+ a 3 

a l a 2 


where [A] = 


aj a\ a£ 

a l a 2 a 3 

a? a\ al 


Or expanded in detail, 


det A = a\a\a\—a\a\a\—a\a\a\+a\a\a\+a\a\a\—a\a\a\ . (Equation D3) 

This can be checked by “Euclidean geometry” calculations of the volume of 
the parallelepiped to which A takes the unit cube (Fig. 3.5). 
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In four dimensions, starting with the obvious definition for “hypervol¬ 
ume” of a “hyperbrick” by multiplying all four edge lengths together, the 
same approach leads to a determinant for A. In any basis we have 


det A — 


aja^a} 

a l a 2 a 3 a 4 

°l a 2 a 3 a 4 

a l a 2 a 3 a 4 


a 2 a 3 a 4 


n 2 a 2 n 2 

a l a 3 a 4 


n 2 a 2 n 2 

a l a 2 a 4 


a l a 2 a 3 

a 2 a 3 a 4 

- a\ 

„3„3„3 

a 1 a 3 a 4 

+ a 3 

„3„3„3 

a l a 2 a 4 

-a\ 

a l a 2 a 3 

4 4 4 
a 2 a 3 a 4 


4 4 4 
a l a 3 a 4 


4 4 4 
<*l a 2 a 4 


a l a 2 a 3 


(Equation D4) 


And so forth for higher dimensions. If you are ready to believe that det A is 
only and exactly “what A does to volume” you can ignore the next section, 
as being preparation for proving the obvious. The important thing about 
determinants is that they exist and have nice properties, not the algebra 
which justifies the properties. 

3.06. Formulae. The general way to find the determinant of any operator 
A from annxn matrix representing it should now be clear: go along the 
top row taking alternately + and — each entry times the determinant of the 
(n — 1) x (n — 1) matrix got from A by leaving out the top row and the 
column that the entry is in (Fig. 3.6). (Notice that the number of multiplica¬ 
tions needed altogether is n! which increases rather fast with n; for example 
5! = 120, 7! = 5,040. This is why finding the determinant of matrices big¬ 
ger that 4x4 occurs as an exercise only in computer textbooks. Reducing 
the number of multiplications is an art in itself.) This describes well how to 
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Fig. 3.6 


• *. 

*. 

.* • • 

.* 

Fig. 3.7 


compute it, though rather uneconomically, but does not lead straight to a 
formula convenient for proving general properties ofnxn determinants. To 
get such a formula, it is best to back off and approach matters a little more 
symmetrically. 

Firstly, notice that in any one of the actual multiplies that are involved 
(in for instance D3 above) no two of the entries multiplied are in the same 
row or the same column. Typically, they appear arranged like the asterisks 
in Fig. 3.7 - exactly one entry in each row and column. Moreover, all such 
arrangements of n entries do get multiplied up and added, with either a + 
or a — sign, to get the determinant. If they all had + signs, we’d be home, 
but we must find a systematic way to indicate which multiple has which sign. 
Now since each such set of entries, M say, has exactly one member in each 
column, we can list M in the order of the columns containing its members: 

M = {a?',. 

say, where m,- means “the number of the row in which the element of M in 
column i sits”. Clearly, M is completely specified by mi,.. .,m n , or to put 
it a little differently, by the function 

m : {1,... ,n} —► {1,... ,n} : it —► m,* . 


Since the elements of M are all in different rows, m is a bijection from the 
finite set {1,..., n} to itself - that is, a permutation of the numbers 1,..., n. 
(This and its properties could be related to the geometry, at the expense 
of greater space. At the moment we want the quickest possible algebraic 
back-up for the geometry that will follow this section.) Now (Exercise la), 
any permutation can be built up by successively switching neighbouring pairs 
(1,2,3,4,5 goes to 1,2,4,3,5 for example), and this can generally be done in 
several different ways. Moreover (Exercise lb), the number of such switches 
required in any such building up is for a given permutation either always 
even or always odd. This lets us define the sign of m: 


sgn(m) 


1 = (-l) even 
-1 = (-l) odd 


according as m is an 


even 

odd 


combination 
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of switches. Finally, it turns out that the sign of m is exactly the sign we want 
for our multiple. (Exercise lc,d) So if we denote the set of all permutations 
of 1 ,..., n by S n (it is in fact a group (cf. Exercise 1.10) - the symmetric 
group on 1,..., n) we can at last write down a nice closed formula 


det[a}]= £ 8gn(m)ar«r • • • 

m£S n 

for the determinant of a matrix. 

With this we can prove algebraically the important properties of de¬ 
terminants that are geometrically obvious, but harder to prove rigorously 
(Exercises 2-5). Returning to the geometrical viewpoint, we can see these 
properties directly. 

3.07. Lemma, det I = 1. 

Proof. Either calculate from the matrix [5j], or observe that I leaves volume, 
along with everything else, unchanged. □ 

3.08. Theorem (The Product Rule). For any two operators A } B on X } 


det( AB) = det A det B . 


Proof, det A is what applying A multiplies volumes by. 
det B is what applying B multiplies volumes by. 
det(AJ3) is what the operation of (applying B and then applying A) 
multiplies volumes by. 

End of proof. (Compare Exercise 2.) □ 

3.09. Lemma. If A: X —► X and dimX = n, then det(aA) = a n det A. 

Proof. det(aA) = det (a(JA)) = det ((a I) A) = det(aJ)detA. Evidently 
aJ, which multiplies the length of each side of the n-cube by a, multiplies its 
volume by a n . □ 


3.10. Theorem. An operator A on X is an automorphism if and only if 
det A / 0. 


Proof. If A is an automorphism, then there exists A 4- such that AA*"~ = I. 
Hence 


det Adet(A*~) = det(AA*“) = det I = 1 


and thus 


det A = 


1 

det(A^) 


# 0 . 


Conversely, if A is singular, the unit cube is squashed flat by A in 
the direction of some singular vector. Thus its image has zero volume, so 
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det A = 0. (This argument is made rigorous, via algebra, in Exercise 4.) Thus 
if det A ^ 0, A is non-singular and hence by 2.12 A is an automorphism. □ 

3.11. Orientation. By 3.10 all automorphisms have non-zero determinant. 
Hence they fall naturally into two classes - those with positive and those 
with negative determinant. Now if X is 1-dimensional, A : X —► X reduces 
to multiplication by some scalar a. The determinant det A is just a, and is 
positive or negative according as A preserves or reverses direction. In two 
dimensions det A is positive according as A merely distorts the unit square 
into a parallelogram or turns it over as well. In three dimensions, det A 
is positive or negative according to whether A preserves or exchanges left 
and right handedness, apart from warping hands. We are led to a general 
definition: 


O v 



|a 



6 

Av 

or negative 

0 

V 


i A 



Av O 

Fig. 3.8 



or negative 




Fig. 3.9 
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An automorphism A : X —► X is orientation preserving or reversing ac¬ 
cording as det A is positive or negative. (A precise definition of “orientation” 
is given in Exercise XI.3.1) 

3.12. Remark. If A : X —► Y is a linear map from A to a different space Y, 
even with dimX = dimY, then det A is not defined; we could change “what 
A does to volume” by altering our measure of volume at one end but not at 
the other. However, if we pick ordered bases ft ft for X and Y they define an 
isomorphism B : Y —» X (cf. proof of 2.06), and hence a quantity det(2? A) 
since BA : X •—► X. This is exactly the result of computing the determinant 
of the matrix [A]p . Now since B is an isomorphism, A is an isomorphism 
if and only if BA is an automorphism (Ax = 0 BAx = 0, since B is 
non-singular: then apply 2.12), that is if and only if det(BA) / 0. Thus the 
determinant of any matrix representing A remains a valid test for singularity. 

If we have a measure of volume already chosen at each end, with a little 
care det A can be reinstated in its full glory as “what A does to volume” 
(cf. Exercise V.1.12). 

3.13. Characteristic Equation. One of det’s many uses is concerned with 
eigenvalues: 


A is an eigenvalue of A <=>> 



Ax = Xx for some non-zero x 

Ax — A* = 0 for some non-zero x 
(A — A I)x = 0 for some non-zero * 
det(A - XI) = 0 . (by 3.10) 


Now for any choice of basis, giving a matrix [aj] for A, det(A — XI) is a 
polynomial in A. Its coefficients are various terms built up from the aj’s. 
Hence A is an eigenvalue of A if and only if A is a real root of the n-th 
order polynomial equation det(A — XI) = 0, which is therefore called the 
characteristic equation of A. (With real vector spaces, complex roots are 


irrelevant. For: 


0 

1 



R 2 , rotation through 90°, has characteristic 


equation 


|Abi! 


lAbal 








Fig. 3.10 
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= det ^ 


= det 


0 
l 

-A 
1 

= A 2 + l 


-II 

0 j 

-1 

-A 



with no real roots. Clearly from the picture there are no eigenvectors or 
corresponding eigenvalues.) 

3.14. Trace. The meaning of the trace of an operator is less clear geometri¬ 
cally than that of the determinant. Algebraically, it is very simple defined: 
if [A] = [aj], 

trace A = tr A = a{ + a\ + ... + a” 

= a\ in the summation convention. 

It is obvious that tr(A + B) = tr A + trB, and a simple check (Exercise 6 ) 
shows that this formula, like that for determinant, gives the same answer 
regardless of the basis in terms of which A is expressed. 

Trace can partly be thought of by its role in an important special case, 
where it measures how “close to the identity” an operator is. Each diagonal 
entry such as 03 is “the 3rd component of the image A 63 of the basis vec¬ 
tor 63 ” in 61 ,... , 6 n coordinates. If A is a rotation, so keeping all vectors 
the same length (and while we’re assuming we are in a situation with length 
defined we might as well take the basis vectors to be of unit length), this 
comes to exactly cosa 3 where a 3 is the angle 63 has been turned through. 
(If we have lengths defined, then we have angles, by c 2 = a 2 + 6 2 — 2a6cosa 
for a triangle.) The trace is the sum of these cosines, and thus for a rotation 
varies from n = tr I = dim A to —n. For the rotation in 3.13, all vectors 
turn through 90°, and the trace is 0 + 0 = 0. 

This description is complicated for general operators by the fact that, 
trivially, tr(aA) = a(trA), so that the “size” of an operator comes into 
play; the trace function is the only major one in linear algebra that seems 
to be genuinely more algebraic than geometric. Like the determinant, it 
can be defined without reference to a basis (cf. V.1.12) but this takes more 



Fig. 3.11 
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theory than the coordinate approach and in this instance is no more intuitive. 
(An example in which trace is intimately involved is discussed at length in 
Chapter IX. 6 .) If A is a projection, tr A is just the dimension of its image, 
as is obvious by a convenient choice of basis. 

Exercises 1.3 

1. a) Any permutation m can be produced by successively switching neigh¬ 

bouring pairs. (Hint: get the number m*“( 1) into 1st place and pro¬ 
ceed inductively.) 

b) If * x * 2 ... th is a composite of neighbour-switches *,-, then 

(t\t2 . • • *h) = *h*/i—1 • • • t\ • 

Show that if such a composite ends up with everything where it 
started, h must be even. (Hint: show that any given switch must 
be used an even number of times.) Deduce that if 

S\S2 ... Sk = m = *i* 2 •••*/» 

then k + h is even and hence (— l) k = (— 1 )\ 

c) Check that the signs in equations D2, D3 and D4 coincide with the 
signs obtained by considering permutations. 

d) Prove by induction that this holds in general. 

2. Let [a*], [&*•] be square matrices with determinants det A, detB, re¬ 
spectively. 

a) Show that if m E S n then sgn(m) = sgn(m" 1 ) and deduce that 

det A = “1.4 •••<„■ 

m£S n 

That is, rows and columns may be interchanged without altering the 
determinant. 

b) Prove that the determinant function on matrices defines a linear func¬ 
tion on the space of possible i-th columns for any fixed choice of the 
other columns, for any i = 1 ,..., n. 

c) Consider [a}] as the ordered set of columns (a\, a^, ..., a l n ) where for 
instance 



For any m E S n , prove that sgn(m) det A = det (a ' mi , aj na ,..., aj nn ). 
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d) Prove that if [c*-] = [a‘ t 6^] then, from part (b) 

detC = det[cj] = £ •• -C* ><,••• ,«mj • 

mes n 

e) Deduce Theorem 3.08: det (7 = det AdetB. 

3. a) Deduce from 3.07 and 3.08 that if A is invertible then det A*~ = 

(det A)" 1 . 

b) Deduce from (a) and 3.08 that for matrices P, <2, R with R invertible, 
if P = RQR*~ then det P = det Q. Hence deduce that if P represents 
P according to the basis /?, for any other basis /?' we have det([P]^ ; ) = 
det P. 

4. If A : X —► X is singular then X has an ordered basis /? whose first 
member is a singular vector for A. Deduce by considering [A]p that 
det A = 0. 

5. Prove, from the general formula, that for any n x n matrix A = [a*]: 

a) det(6A) = b n det A, where bA = [6aj]. 

b) If B is obtained by multiplying some row or column of A by 6, det B = 
6 det A. 

c) If B is obtained by adding some multiple of one row (or column) of A 
to another row (or column) of A, detP = det A. 

(Results (b) and (c) can save a great deal of work in computing large 
determinants by hand. But who does, nowadays?) 

6. If bj = c^ajCj, where cj.6*cj = (so [cj] = [c*] 4 ""), then b\ = a\. 

7. Let *i,..., Xk be eigenvectors belonging to eigenvalues Ai,..., A* of 

an operator A. If x\ = J2i= f° r some i> 2 , •••,&* € then 

0 = ]C*=2 *»^*(^* ^l)> so that «2 = ]Ci=3 ^ ^ 

a 2 / 0. 

Deduce inductively that if Ai, - -., A* are distinct, * 1 ,..., Xk are inde¬ 
pendent. 

8. For any projection P : X —► X, the only choices of y G P(X), 

z G kerP such that x = y + z are y = P(*), z = x — y (cf. VII. 

Exercise 3.Id). 
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“Let the thought of the dharmas as all one bring you 

to the So in Itself: thus their origin is forgotten 

and nothing is left to make us pit one against another.” 

Seng-ts’an 


1. Spaces 

Our geometrical idea (1.1.01) of a vector space depended on a choice of some 
point 0 as origin. However, just as for bases, there may be more than one 
plausible choice of origin. Similarly, it may be useful to avoid committing 
oneself on the question (a fact discovered by Galileo). For this purpose, and 
for the sake of some language useful even when we have an origin, we shall 
consider affine spaces. 

The basic idea is a return to the school notion of a vector, as going from 
one point A to another point B in space. Points are just points, without 
direction, but their separations have direction and length. Thus we define:- 



1.01. Definition. An affine space with vector space T is a non-empty set X 
of points and a map 

d:X xX->T , 

called a difference function , such that for any x,y, z £ X:- 
Ai) d(x,y) + d(y,z) = d(x,z) 

Aii) The restricted map d x : {x} xI-^T:(a:,y)H d(x,y) is bijective. 
Condition Ai) says that “going from x to y, then y to z” is a change 
by the same directed distance as going directly to z. It has two important 
immediate consequences: 

(a) Put y — z — x , then 

2d(x, x) = d(x, x) + d(x, x) = d(x, x) . 
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x 




So d{x , a?) = 0 for all x € X y hence 
(b) putting z = x, 

d(ar, ?/) + <%, x) = d(x, a?) = 0 

so d(x ,t/) = — d(y,x) for all x,y £ X. 

Condition A ii) just says that given x E X and t £ T, there is a unique 
point to be reached by “going the directed distance i, starting from x”. We 
denote this point by x + t: if t = 0, x + t = x by (b) above. Similarly, if V 
is a subset of T we denote {x + t\ t£V} by x + V. 

1.02. Tangent spaces. We can use the bijection d x given by Aii) to define 
a vector space structure for {ar} x X from that of T, by 

(x, y) + (x, z) = <£■ ( d x (x, y) + d x (x, z)) 

(x,y)a = d~((d(x,y))a) 

(cf. Exercise 1). The set {a;} x X with this structure will be called the tangent 
space to X at ar, and denoted by T x X. 

For a one-dimensional T we have a picture, but two dimensions of T 
require four for the analogous diagram. If v € T x X we denote x + d x (v) also 
by x + v. 

The vectors in T x X are called tangent or bound vectors at x\ the vectors 
in T are called free. The reason for the word “tangent” will become appar¬ 
ent when we start bending pieces of affine spaces around and sticking them 
together to make manifolds. (Even if an affine space X has a vector space 
with some other symbol, S say, we shall still call {x} x X with this vector 
space structure T x X , to keep the association with “tangent”.) We call d x a 
freeing map, its inverse a binding map. 
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x y z 


Fig. 1-4 


1.03. Subspaces. The requirements for a subspace of a vector space X were 
essentially that it should be again a vector space (1.1.03). Since we already 
knew that x + y and y + x, etc., were equal, it was only needful to require 
that within the subspace they should still be well defined. 

In the same way, X f C X is an affine subspace or flat of X if 

i) d(X' x X f ) is a vector subspace of the vector space T for X, and 

ii) X' is an affine space, with vector space d(X ' x X 1 ) and difference 
function 


d : X' x X' <f(X' x X f ) : (*,y) >-+ d(x,y) . 

If d(X f x X 1 ) is a hyperplane of T, then X' is an affine hyperplane of 
X. Evidently if V is a vector subspace of T and x £ X, then x + V is an 
affine subspace of X . 

Notice that any vector space X has a natural affine structure , with vector 
space X itself and the difference function 

X x X — * X : (*,y)»—► 2/ — x . 

Hence we may talk of an affine subspace, or hyperplane, of a vector space X, 
which need not be a vector subspace of X (cf. 1.06 below). 

The affine subspace generated by a set 5, or affine hull H(S ) of S (com¬ 
pare 1.1.04), is the smallest affine subspace containing S. (It is easy to show 
that the intersection of any set of subspaces is again a subspace, so 

n< X 7 | X' an affine subspace of X } 

is a subspace. Evidently it contains S and it is contained in every other such; 
so it is the smallest, (cf. Exercise I.1.3a).) Pairs of points generate lines, non- 
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collinear triples generate planes, etc. We could define “affine independence” 
analogously to Definition 1.1.07 (for instance three points in a straight line 
are dependent) together with “affine rank” etc.: we shall not develop this 
beyond a consistency check in Exercise 2.3. 

1.04. Definition. The translate X* + t of an affine subspace X f of X by a 
vector t E T is defined as the affine subspace { x+t | x E X f } (cf. Exercise 2b, 
and Definition 3.03). 

Two affine subspaces X\X" of X are parallel if d(X f x X 1 ) = 
d{X" x X"). 

1.05. Lemma. Two affine subspaces X',X n of X are parallel if and only if 
X n = X 1 + 1 for some t £ T (not necessarily unique). 

Proof 

1) If X*,X U are parallel, choose x 1 E X', x n E X n and set t = d(x f } x"). 
Then 

y" e X" «=* d{x\y") E d{X" x X") 
d{x\y") E d(X ' x X') 

<=> d(x', y”) = d(x\ x") + d(x", y") by A i) 
= s +1 , where s E d(A ; x X') 
<=> y" & X f + t (cf. Exercise 2b) 

Hence X H = X 1 +1. 

2) If X" = X' + t, and s", t/' E X" then 

for some x', y‘ 6 X' (definition of X' + t) 
— d(d£ (t), dp (t — d(x', j/))) (cf. Exercise 2d) 
= t-(t-d(x',y')) 

= d(x',y') 



Fig. 1.5 
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e d{x‘ x x 1 ). 


Hence, 

d{X" x X") C d(X' x X') 


Similarly 

d{X' x X') C d{X" x X") . 


Hence, 

0- 

X 

II 

X 

□ 


1.06. Lemma. For X a vector space, X' C X is an affine subspace of X if 
and only if X 1 is a translate of some vector subspace of X. 

Proof If X f is an affine subspace of X , set X" = {* — y \ x,y £ X f } = 
d(X' x X 1 ). 

Then X n is a vector subspace of X by definition (1.03), and 

d{X" x X”) = X" since 0 £ X" 

= d(X‘ x X') 

and X f is a translate of X n by Lemma 1.05. 

If X 1 is a translate of a vector subspace then it is an affine subspace by 
Exercise 2c. □ 

1.07. Definition. The dimension , dimX, of an affine space X is the dimen¬ 
sion of its space of free vectors (cf. also Exercise 2.3). 

1.08. Coordinates. If we choose an ordered basis for T , we have an iso¬ 
morphism A : T —► R n , by 1.2.06. If we then “choose as origin” some point 
a £ X, the composite bijection 

C a :X T a X R" 

x i—► (ci, x) i—► d{a,x) 

defines a “ choice of coordinates” for X , or chart on X, (A chart of an 
ocean assigns, as labels to points of the ocean, pairs of numbers - 23° N, 
15° W etc. - and thus is essentially a function: Ocean —► R 2 . Unlike charting 
a plane, however, we cannot choose coordinates nicely all over the Earth; 
longitude, for instance, is not defined at the poles. This leads us to the 
notion of local chart that we use for manifolds.) Notice that we are not using 
C a to make X a vector space, in contrast to the way we make T a = {a} x X 
a vector space by d a \ we are just using it for labelling points. In the same 
way one does not add the coordinates of Greenwich to those of Montreal and 
get anything of any significance. Fixing an origin for X and a basis for T, 
we label points x £ X by their images in R n ; this is illustrated in Fig. 1.6 for 
two such choices for the plane (if you don’t bend it) of this page. 

The basis/? = (&i,..., b n ) ifor T defines a basis (d£~(bi ),... ,d^~(b n )) for 
each T x X. We denote this basis by /?*, and its members also by (}\ x ,..., (3 nx . 
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Fig. 1.6 


If we have two different choices of origin and basis, a,/? and a',/?' say, 
to change from the first system of coordinates to the second we must apply 


R" R n : (ar 1 ,... ,x”) h-*- [T]^ 


P‘ 


*1 


Lx n 


+ ^(d(a',a)) 


where A is the map T —► R n given by the basis (3 *. In the formula for 
individual coordinates, this becomes 


x 1 ' = V-x? + a* 

where is the i-th coordinate in the {f system of the j -th vector in /?, and 
a* is the i-th coordinate of the vector d(a',a) from a' to a in the /?' system. 
We shall not often need this particular operation. 


Exercises II. 1 

1. If T x has the vector space structure defined in 1.02, then d x : T x X —► T 
is an isomorphism. 

2. a) Find a subset SCR such that any vector v 6 R (treating R as a 

real vector space) occurs as u — w for some u and w E 5, so that S 
satisfies 1.03i but S is not a flat of R. 

Show that if X 1 C X satisfies 1.03i, and also X f = x+d£~(X f xX f ) 
for some x E X\ X ' is a flat of X. 

b) For any * € X' % X 1 + t = { (x + 1) + s | s 6 d(X f xT)}. 

c) Prove that X 1 + t is an affine subspace of X. 

d) Prove that if x 9 x* e X, t E T, then x* + t = x + t + d(x,x') = 
x + t — d(x,x'). 

e) Prove that if <i,f 2 G T, (x + f x ) + < 2 = x + (<i + t 2 ). 
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2. Combinations of Points 

We cannot add two points x, y in an affine space X , any more than we can 
add the positions of London and Glasgow. But we can talk about the point 
“midway between them”: namely, the point x + ^d(x,y) reached from x by 
going halfway to y (translating by half the difference vector). But x+|d(x, y) 
is a rather asymmetrical name for a symmetrical notion; so is y + \d(y, x), 
which ought to be the same point. Indeed, 

y = x + d(x,y) , 

so y + \d(y,x) = x + d(x,y) + |(~d(x,y)), by Ai) in 1 . 01 , hence 

V = %d(y,x) = x + ±d(x,y) . 

So we give it the symmetrical name 

2 *+ 2 *' 

without asserting that or + here mean anything: we are just abbre¬ 

viating x + \d(x, y) and t/ + \d(y, x) symmetrically. (But when they do have 
separate meanings, because A is a vector space, no ambiguity arises. Giving 
X its natural affine structure (1.03), 

x + \d(x, y) = x + \(y-x) = \x + \y 

anyway.) 

Of course, | x + \y lies on the line through x and y. So in fact does any 
point x + Ad(x, y): this is a special case of Exercise 2b, which we need not 
prove yet. Again, x + Xd(x,y) is asymmetrical in starting from x, and we 
have 

y + (1 - A)d(y, x) = x + d(x, y) — (1 — A)d(x, y) = x + X d(x, y) 
and we would prefer a symmetric notation. 

2.01. Definition. The affine combination 

fix -j- Xy , where /i + A = 1 

of x,y e X is the point defined equivalently by 

x + Xd(x, y) or y + /id(y, x) . 

(Notice that y,x + Xy is not defined if /i + A ^ 1.) 

We shall further abbreviate /i#-f (—A )y to px — Xy, as in Fig. 2.1. Notice 
that y,x + Xy is between x and y exactly when A, fi are both positive. 
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z 



What about repeating this “combination” process? For example, what 
is “the point midway between z and ^x + |y”? Very conveniently, 


2(2* + 2*0 + \ Z 

coincides with 

\x + \(\y+\z) 

and with 

\y + !(§*+ §*) 

(Fig. 2.2, Exercise la). We can unambiguously call it 

\x + \y + \z , 


multiplying out the brackets. In general, for Ai,...,A 4 E R we can take a 
“repeated combination” of x\ ) ..., x$ E X 


XiX! + (1 - + (* ■ (T=Alj) (i-a'-a/ 3 

+ ( X - 1- A A 1 3 -A 2 ) (l-At -A 2 -A 3 X4 + (* ' l-A^Aa-Aa)* 5 ))) 


which multiplies out to 

A 1 X 1 + A 2 X 2 + A 3 X 3 + X 4 X 4 + (1 — Ai — A 2 — A 3 — A 4 )X 5 . 
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2.02.Definition. Given G X } and Ai,...,A* G R such that 

A* = 1 , the affine combination 


Aixi + ••• +A n x n 

is defined in terms of 2.01 as 

A,«, + (1 - AO +(-+ (* - i-A,- A„_ 2 ) *") ') 

(where, by Exercise lb, the order in which we take the terms Adoes not 
affect the point defined.) 

The requirement that Yli=i A* = 1 imposes a little extra care in manip¬ 
ulation. For instance, the statements 

x = y + w - z , \x + \y = \w + \z , 

are meaningful, since they have names of points on each side. But the super¬ 
ficially equivalent 

x w — y z = 

x + z = y+w x+y = w+z 

equate expressions we have not defined, (we could define them, but expres¬ 
sions like x — y would have to refer to “points at infinity” - can you see why? - 
and would take us into projective ) not affine, geometry). 

This gives us an “internal” expression (Exercise 2a) for the affine 
hull (1.03) of 5 C X, as the set of affine combinations of points in 5. This 
is precisely analogous to the two descriptions of linear hulls in 1.1.04 and 
Exercise 1.1.3b. We can also define the convex hull C(S) (Fig. 2.3) as 

* 

{ Ai^! +-h AjfeXjk | Xi G 5, A, >0, i G {1,..., &}, k G N, ^ A t - = 1 } . 

*=i 

Also, convex sets are those S with C(S) = S. Now, (Exercise 2b) a fiat 
has H(S) = S and since S C C(S) C H(S)> a flat is always convex. Convex 
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sets are of great practical importance, for instance in linear programming 
and control theory. We shall not develop it here: but note that intervals in 
R are convex. 


Exercises II.2 

1. a) Prove from Definition 2.01 that 
X(fxx + (1 - fi)y) + (1 - A)z 

b) For any permutation 

m : {1 ,..., k} {1,..., k} : i t-* m t - (cf. 1.3.06), 


with Ai,...,Ajb E R s.t. Ai H-b A* = 1, and «!,...,£*€ Af, then 


a.*,+ a - a.)(y ^* 2 + • •+(i - !_ Ai • • •) 

= A mi%mi + (1 — Ami) (“j 7 4" * ‘ * 


(*-- 


'rn k -i 


'™k-2 


2. a) Prove that the affine hull H(S) of S C X (1.03) consists exactly of 
the set 


{ Ai^i H-b AjtXfc | xi E S', i E {1,..., &}, k E N } . 

b) Prove that S is an affine subspace of X if and only if H(S) = S. 
(cf. Exercise 1.3c) 

3. Suppose that S = {a?i,... ,£*}, contained in an affine space X , does 
not satisfy an equation 


X{ — Ai^i H-b A,-ix,_i + A t +i£j+i H-b AjfcXjb 

for any i. Then using Definition 1.07 

dim H (S) = k — 1 , 
and H(S) = X if and only if k = dimX -b 1. 
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3, Maps 

One attraction of affine combinations is that they are “intrinsic to the space”: 
one could argue that the idea of the midpoint of x and y is more basic than 
“x plus the difference vector from x to y”, which was our definition. It is 
certainly several thousand years older, and one can pinpoint the introduction 
of the more general px + A y to Eudoxus’s theory of proportions. We had the 
machinery of Chapter I to hand, however, so Definition 1.01 was technically 
more convenient. 

The structure-minded reader will find it a fruitful exercise to define an 
affine space as a set X with a map 

A:XxXxR->X 

to be thought of as 

(x,y, A) (1 - A)x + Ay 

satisfying appropriate axioms, and construct the corresponding T and d. 
Notice (Fig. 3.1) that by Exercise 1, starting with Definitions 1.01, 2.01 

d(x, y ) = d(x', y') <=> \x + \y' = \x' + \y . 

So starting from affine combinations, we could define 

(x,y) ~ (x',y') <*=> \x + \y' = \x' + \y , 

and prove from the chosen axioms that ~ is an equivalence relation. Then T 
as the set of equivalence classes 

[(z.y)] = { (z'.j/) I (*.y) ~ (*'.i i)} 

and d as the map 

should have the structures of vector space (1.1.01) and difference map (1.01) 
if good axioms for A have been picked. 



Fig. 3.1 
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We leave this programme to the reader, but it motivates our next def¬ 
inition. With any kind of “set with structure” we are interested in maps 
from set to set that “respect the structure”. With vector spaces it was linear 
maps; now, it is those that preserve affine combinations. 

3.01. Definition. A map A : ( X,T ) —► ( Y,S ) between affine spaces (or 
P —► Q between convex sets in A, Y) is affine if for any x,x' E X, A E R 
(or x, x' £ P, 0 < A < 1) it satisfies 

A((l — A)x + Ax') = (1 — A)Ax + A Ax' . 

(We shall only want the convex sets case in Chapter IX, with P and Q 
as intervals in R: cf. Exercise 9.) 

From the way we built up the meaning of multiple combinations in §2, 
it is clear that this implies 

A(AiXi +-(- A*x*) = AiAxi H-1- A kAxk 

and (applying Exercise 2.2a) that 

4(ff(S)) = 2 r((A(S)) . 

Thus A preserves affine combinations and affine hulls: in particular, using 
Exercise 2.3b, A : X —► Y carries flats to flats (such as lines to lines, or lines 
to points - why to nothing else?). Note that the map taking all of X to the 
same y £ Y is a perfectly good affine map, just as the zero map between 
vector spaces is linear. Affine maps may squash flat, but never bend. 

3.02. Definition. An affine map A : X —► Y takes all pairs of points in X 
separated by a given free vector t £ T to pairs of points in Y all separated 
by the same vector in S , which we may call At. (This is just a rephrasing of 
Exercise 2.) Exercise 3 checks that A is a map T —► S and is linear, so we 
may call it the linear part of A. Clearly for any x 0 £ X , 

A{x) = A(x 0 + d(x 0 , x)) = Ax 0 + A(d(x 0 , x)) , Vx € X, 

so if we know the linear part of A and the image of any point in X, we know 
A completely. A linear map, indeed, is its own linear part (Exercise 8 ). 

3.03. Definition. An affine map A : X —► Y is an affine isomorphism if 
there is an affine map B :Y —► X such that AJ3, BA are identity maps. An 
affine isomorphism X —* X is an affine automorphism. 

A translation of X is a map of the form xnx + f, for some free vector t. 
(One can add and scalar multiply translations just like their corresponding 
free vectors, and this gives yet another approach to defining an affine space.) 
Evidently every translation is an affine automorphism, and the translate of 
a subspace (Definition 1.04) is its image under a translation. 
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3.04. Definition. The image AX of an affine map A : X —► Y is its set- 
theoretic image { Ax | x G X }. Since X is (trivially) an affine subspace of 
itself, AX is a flat of Y. 

The rank r(A) of A is dim(AX) < dim(Y). (Definition 1.07) 

The nullity n(A) of A is the nullity of the linear part of A. 

3.05. Components. Applying the equation after Definition 3.02, it is clear 
that fixing origins Ox, 0y for X and Y, and bases for T, 5, we can write A 




-X 1 ' 


‘a 1 ' 


H 

2 

II 

J*. 


+ 




.* B . 


.a". 


Here [Aj] is the matrix for A given by the choosen coordinates and (a 1 ,..., 
a m ) are the coordinates of A(0x) in the chart used on Y. In individual 
components, 

(Ax)*' = A)xf + a 1 

where X is n-dimensional and Y is m-dimensional. 


Exercises II.3 

1. a) Prove that x + |d(x, y) = x' + |d(x',y) if and only if d(x,y) = 

d(x',y'). (Hint: let d(x / ,y / ) = d(x,y) + i, and observe d(x, y 7 ) = 
d(x,y)-hd(t/,x , ) + d(x , ,y').) 

b) Deduce that d(x, y) = d(x', y') if and only if \x + \y* = + |y. 

2. Deduce from Exercise lb that if A : (A, T ) —► (Y, 5) always satisfies 
A (= \Ax+\Ax* (in particular, if A is affine), then d(x, y) = 
d(x / ,y / ) => d(Ax, Ay) = d(Ax', Ay'). 

3. If A : (X, T) —► (Y, s) is affine then: 

a) A = { (t, s) | 3x 6 X s.t. d(Ax, A(x + 1 )) = s } is a mapping S —► T. 

(So A satisfies Axioms Fi, Fii on p. . Use Exercise 2 for Fii.) <= 

b) A(ti + t 2 ) = Ati -|- At 2 (choose x G X and consider x + 2ti, x + 2 t 2 
and the point midway between them.) 

c) A(Xt) = A At (consider A(x + At)). 

4. If A : X —► Y is affine then: 

a) For any flat Y' of Y, A 4 _ (Y / ) is a flat of X: in particular, for any 
y G Y, A*“({y}) is a (perhaps empty) flat of X. 

b) For any y,y' G AX C Y, dim(A*"{y}) = dm^A^y'}) = n(A). 

c) n(A) + r(A) = dimX. 

5. a) If Y', Y" C AX are parallel subspaces of Y then A^Y' and A*~Y H 

are parallel subspaces of X. 
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b) Deduce that if y, y' € AX then A*~{y} and A"”{y'} are parallel flats 
of X. 

6. An afflne map is an affine isomorphism if and only if its linear part is 
an isomorphism, and hence if and only if it is a bijection. 

7. Affine maps A, A! : X —► Y have the same linear part if and only if 
A f = T o A, where T is a translation Y —► Y . 

8. Let X, Y be vector spaces, alias X, Y when considered in the natural 
way as affine spaces (with free spaces X, Y). Show that a map M : 
X —► Y is linear if and only if M(0x) = 0y and, considered as 
M : X —► Y , it is affine. Deduce that M then coincides with its linear 
part M as a map between sets. 

9. a) Any affine map A between convex sets P C X, Q C Y is the restriction 

of an affine map A : X —► Y, and A is uniquely fixed by A if and only 
if H (P) = X. 

b) If P, Q are intervals in R, and R has its natural affine structure with 
vector space R, then for any affine map A : P —► Q there are unique 
numbers 01,02 € R such that 

A(p) = aip + a 2 for all p £ P. 
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“A duality of what is discriminated takes place in 

spite of the fact that object and subject cannot be defined.” 

Lankavatara Sutra 

1. Contours, Co- and Contravariance, Dual Basis 

1.01. Notation. Throughout this chapter X and Y will denote finite-dimen¬ 
sional real vector spaces, and n and m their respective dimensions. 

1.02. Linear Functionals. Just as the linear maps from X to itself have a 
special role and a special name, so do those from X to the field of scalars, R. 
They are called linear functionals on X, or dual or covariant vectors. (The 
term “covariant” is to distinguish them from the vectors in X, which are 
called contravariant. This is related to the “backwardness” of the formula for 
changing basis discussed in 1.2.08; we shall look at it in more detail in 1.07.) 
The space L(X; R) of linear functionals on X forms a vector space (as does 
any L(X;Y); cf. 1.2.01) which will generally be denoted by X* and called 
the dual space of X. 

Geometrically, a linear functional may best be thought of by its “con¬ 
tours”. The geographical function “H = height above sea-level” is very 
effectively specified on a surface by drawing lines of constant height - that 
is, by drawing the sets H^(h) for various values of h. (Fig 1.1). Similarly, 
a non-zero linear functional / on an n-dimensional space will have contours 
that are lines for n = 2, planes for n = 3, (Fig 1.2) and parallel affine hyper- 
planes in general. (They will be parallel by Exercise II.2.3, and hyperplanes 
since / ^ 0 => fX = R => r(f) = l => n(/) = dim(X) — 1 by Exercise II.4d. 

1.03. Dual Maps. From a linear map A : X —► Y , we do not get naturally 
any map X * —»Y*; a function A defined on X cannot be expected to change 
a function f £ X* also defined on X to one defined on Y. However, we have 
a natural way to get a map the other way: 

A* : Y* ^X* :/h/oA, 

we call A* the dual map to A. 

(This kind of reversal of direction is also called “contravariant”, a habit 
that arose in a different part of mathematics entirely and conflicts with the 
usage for vectors, turning it from an oddity to a nuisance. However, both are 
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[/] = t/l /2 ■■■fn] 

= fi [ 10 ... 0 ] + / 2 1 [010 ... 0 ] + ... + /* [0 0 ... 01 ] 

= fim 
= L tib *]. 

Now, f = f^b * in a unique (why?) way, so 0* = {ft 1 ,. ••>&"} is a basis 
for X*, giving f coordinates (//,..., fl) = (/ 1 ,..., f n ) for short, and so 
dimX* = n = dimX. (This is in fact, just a special case of the general 
result (1.2.07) that dim (L(X;Y)) = dimX dimY, since X* is just L(X; R) 
anddimR=l.) □ 

1.05. Remark. It is tempting to identify X* with X , since using the basis 
6 1 ,..., b n constructed in the proof of 1.04 we can set 

J3 : : 6 ,* 1 —► 6 * , for i = 1 ,..., n 

and by 1.2.05,1.2.06 this determines an isomorphism B : X —► X*. However, 
this has great disadvantages, because the isomorphism depends very much 
on the choice of basis. Moreover, dual maps become confusing to talk about, 
because if X * “is” just X, and Y* “is” just Y, by virtue of isomorphisms B, 
B' defined in this way, A* goes from Y to X; you identify A* with B*~ A* B*. 

x* 41 y* 

B i ]* 

X —► Y 

A 

But if we choose a new basis = ^ 61 ,..., \b n for X, (and leave /?' alone) 
we get a new basis for X*, whose i-th member is a functional taking 
instead of 6 ,-, to 1. It thus takes &,• to 2 so the new basis (^/?)* is 26 1 ,..., 26 n , 
and the new isomorphism B n : X —► X* defined by | 6 j 26*, i = 1,..., n, 

is equal to 42?. Therefore 

(B'TA'B' = , 

not at all the map just identified with A *, though constructed in the same 
way. Moreover, identification of X* with X is a particularly bad habit if 
carried over to infinite dimensions, where there may be no isomorphism, not 
just no natural one. 

1.06. Dual Basis. Although the dual basis 0* = ft 1 ,... , 6 n constructed for 
X* in 1.04 should not be used to identify X* with X, it can be used very 
effectively to simplify the algebra. Given a vector x £ X and a functional 
/ G X*, with coordinates (a? 1 ,... ,x n ) and (/ 1 ,... ,/ n ) according to 0 and 
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/?*, we have 

/(*) = ( 0 )^) 

= /^( 6 %)) 

= fi^Sj 

= fa* 

- a nice simple formula. So when we have a basis /? chosen for X we usually 
choose the dual basis /?* for X*. Notice that the dual basis £* = e 1 ,... ,e n 
to the standard basis £ for R n (cf. 1.1.10) consists simply of the coordinate 
functions:- 

e* : R" R : (z 1 ,..., a:") a:* 

Now, if we have bases 0 = bi,... ,b n for X, 0' = b[,... ,b' m for Y, giving 
an m x n matrix A = [A]^ for A : X —► Y, what is the matrix [A*]^,.? 

If / = (fi . f m ) in “dual coordinates” on Y *, then 

A* f = A*(fjb'i) . 

Hence, 

(A*f)b t = (A*(fjb'i))bi 

= fjb'i(Abi) (definition) 

= fj b ' 3 ( a i .«") since hi = (0 ,..., 1,..., 0) 

t 

i-th place 

= fja\ (definition of bj), 


for any 6,* E /?. Therefore in dual coordinates on X*, A* f is ( a\fj ,..., a£/j). 
This is exactly the result of applying the nxn matrix A*, the transpose of A, 
obtained by switching rows and columns in A:- 






ra\ of • 

. . - 

a l • • 


. - 


• 4 ■ 


a l a 2 

. 

• 

becomes 


• a m 

- ’ 

• a m 




• * 


In formulae, 

[A*] = [A] 1 
[Al) = [A]i . 

Thus the use of dual bases nicely simplifies the finding of dual maps. 
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1.07. Change of Basis. Since it is useful to have the basis of X* dual to 
that of X , when we change the basis of X from /? to ft we want to change 
that of X* from ft to (ft)*. Suppose then that as usual we are given the 
new basic vectors (6J) = (6jbj) = (frj,..., 6") in terms of the old basis. To 
change the representation of a vector x with old coordinates (x 1 ,... ,x n ), we 
have to work out 



The i-th column of the matrix to be inverted is just the old coordinates of 6 t * 
(cf. 1.2.08). If we have / £ X* represented by (/i,..., f n ) in the coordinates 
dual to the old basis, what are its new coordinates? To get them we want 

a!* 

the matrix > = C* for short. Evidently I*. = (lx)*, so by 1.06 the 

matrix C* is exactly the transpose of the matrix [Ijr]jj», uninverted. 

Ix 

So to find the new coordinates of /, just work out 



where the rows of the matrix are given by the old coordinates of the 6,-’s. 

This is what is meant by the statement that dual vectors “transform 
covariantly”, since 

// = fj compared with b[ = bj 

shows that the dual vectors “co-vary” with the basis in transformation of their 
components. Contrariwise, we need the inverse matrix for the transformation 
of ordinary or “contravariant” vectors: 

(x , ) / = 6J-X** , where = 6\ , 

as shown in 1.2.08. 

1.08. Notation. Consistently with what we have used so far, and with phys¬ 
ical practice, lower indices for the components relative to a basis of a single 
object, such as in 
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® = (« 1 ».«n) , ° 3 = ( a l> • • • > a n) 

will indicate covariance, and upper indices such as in 

»=(‘*.*"), *}. = (*}! . $) 

will refer to contravariance. (These examples emphasise that there may be 
more indices around, when the vector is one of a family labelled by these 
further indices - as with the s in a basis.) 

Thus in general will refer to the value a(b) of a applies to b (or 
6 (a), via the identifications in the next section). The reader will notice that 
the numbering 61 ,..., 6 n or 6 1 ,..., 6 n of the vectors in a basis (which are not 
the components of an object) is done with indices the other way up. This 
is peculiar, but standard. It permits us to use the summation convention 
not only to represent a( 6 ) but - as we used it in Chapter I - for linear 
combinations, like 

(a 1 ,... ,a B ) = a’ 6 ,- . 

Since we shall normally suppress reference to the basis that we are using and 
work with n-tuples (or, for instance, mxnxpxq arrays) of numbers defined 
by the use of it, this should not cause too much confusion. 

(Warning: it is in fact possible to regard the n vectors in a basis as the 
components of something called an n-frame. At that point the summation 
convention becomes more trouble that it’s worth. We shall simply dodge this 
problem by only using “frame” in the traditional physicists’ sense as short 
for “frame of reference”. That is a particular choice of basis or coordinate 
system, two notions which we sometimes wish to separate (cf. II. 1.08), neither 
of which is an object with variance at all.) 

1.09. Double Duals. Though there is no natural map X —► X* (“natural” 
meaning “independent of arbitrary choices”; this intuitive idea can be re¬ 
placed by the beautiful and useful formalisation that is category theory, but 
we shall skip the formalities here) there is a very nice map 

6 : X (X*)* f(x)] . (cf. Exercise 1). 

Now for any basis /? = 6 i,..., b n for X with dual basis /?* = 6 1 ,..., 6 n for X* 
and the basis (/?*)* = ( 6 1 )*,..., ( b n )* dual to that for (X*)*, the isomorphism 
X —► (X*)* defined on bases by 6 ,* (b*)* is exactly the map 0. For, 

( 6 T(/) = (frT(/l,--/n) 

= /,• 

= (fib 1 + ••• + f n b n )(bi) , since &»(&,•) = 6j 
= f(bi) 

= (*(&*))(/) • 
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Thus, 9 is an isomorphism, and so thoroughly “natural” that we can use it 
to identify ( X *)* with X, and (6;)* with 6,, without ever creating difficulties 
for ourselves. We shall simply regard X and X* as each other’s duals, and 
forget about (X*)*; in fact this is why the word “dual” is used here at all. 
The practical-minded among us may find comfort in this identification. For, 
we started with abstract elements in X but dual vectors had a role, they 
attacked vectors by definition; now we have a role for vectors, they attack 
dual vectors! 

CAUTION: The above argument rested firmly on the finite-dimensiona¬ 
lity of X. We can always define 0, and it will always be injective, but without 
finite bases around it is not always surjective. This is sometimes not realised 
in physics texts, particularly earlier ones such as [Dirac]. 


Exercises III.l 

1. a) Prove that for any x € X there is a linear map 

:X* -+R . 


b) Prove that the map 


9: X -+(X*Y , 

which by a) takes values in the right space (X*)* and is thus well 
defined, is linear. 

2. Prove, by considering matrices for the operators A and A* in any 
basis and its dual and applying Exercise 1.3.2a) that det A* = det A. 
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“He who is wise sees near and far 
As the same. 

Does not despise the small 
Or value the great: 

Where all standards differ 
How can you compare?” 

Chuang Tzu 


1. Metrics 

So far, we have worked with all non-zero vectors on an equal footing, uncon¬ 
cerned with the idea of their length except in illustration (as in 1.3.14), or of 
angles between them. All the ideas we have considered have been indepen¬ 
dent of these concepts, and for instance either of the bases in Fig 1.1 can be 
regarded as an equally good basis for the plane. Now, the notions of length 
and angle are among the most fruitful in geometry, and we need to use them 
in our theory of vector spaces. But this means adding a “length structure” 
to each vector space, and since it turns out that many are possible we must 
choose one - and define what we mean by one. 

To motivate this, let us look at R 2 with all its usual Euclidean geometry 
of lengths and angles, and consider two of its non-zero vectors v and tn, in 
coordinates (t; 1 ,!; 2 ) and (w x ,w 2 ) . By Pythagoras’ s Theorem, the lengths |v|, 
\w\ of v, w are + (v 2 ) 2 , \/(w x ) 2 + \w 2 ) 2 respectively. The angle a 

may be found by the cosine formula for a triangle: 

ti 2 = |v | 2 + \w \ 2 — 2|v| |u;| cos a . 

Applying Pythagoras, this gives 

(t, 1 _ + ( v 2 _ + ( v 2)2) + ((^1)2 + — 2|v| |in| COS Oc . 

Multiplying out and cancelling, we get 

—2 v 1 w 1 — 2 v 2 w 2 = —2|v| |w| cos a . 
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So, 


v x w l + v 2 w 2 = |v| \w\cos a . 


The left hand side of this involves coordinates, but the right hand side in¬ 
volves only Euclidean, coordinate-free, ideas of length and angle. Denoting 
\v\ • |tu| cos a for short by v • w, we can get both lengths and angles from it 
directly: 

\v\ — (v • u) 2 , a = cos'” 

It has nice neat properties; v • w = w • v, and u w goes to v • w” is a linear 
functional for any v (Exercise 1). In coordinates it has a very simple formula 


v • w 

MM * 


v w — v x w l + v 2 w 2 . 

This depended for its proof only on the x 1 and x 2 axes being at right angles 
(so that we could apply Pythagoras) and the scales of them being right (the 
basis vectors (1,0) and (0,1) being actually of length 1). Also we can use 
it, itself, to define these conditions. For two non-zero vectors v, w with an 
angle a between them have 

v • w = 0 <=> |v| \w\ cos a = 0 

<=> cos a = 0 , since \v\ ^ 0 ^ \w\ 

7T 

<=* a= 2 

and a basis vector b is of length 1 exactly when 6*6=1. So the argument 
establishing the formula works for any basis 6 i, 62 for R 2 provided that 
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bi * bj — Sij . 

In any such basis, v • w will have the formula t^u; 1 + v 2 w 2 . So this “dot 
product” carries complete information about lengths and angles, and defines 
neatly in what bases it has a simple formula. It looks, then, like a good 
candidate for a “length structure”: all that remains is to formalise it. So, 
on to the definition - generalising while we’re at it, because we shall want 
a different sort of “length” for vectors in a “timelike direction” than for 
“spacelike” ones, when we come to spaces that model physical measurements 
(cf. 1.04). Moreover we shall find such generalised structure on, for example, 
the space of 2 x 2 matrices (cf. IX.§6). 

1.01. Definition. A bilinear form on a vector space A is a function 

F:XxX-> R 

which is “linear in each variable separately”. That is to say it satisfies 

B i) F(a + a', y) = F(a, y) + F(a', y) 

F{x,y + y') = F(a,y) + F(a,y') 

Bii) F(aa,y) = aF(a,y) = F(a,ya). 

The geometrical significance of a bilinear form depends on what further prop¬ 
erties it has (the “dot product” discussed above is a bilinear form, but so is 
(v,tu) h* 0, for instance. We need more conditions on a “length structure” 
than just bilinearity). A bilinear form in X is 

(i) symmetric if F(a, y) = F(y, a) for all a, y E X. 

(ii) anti-symmetric (or skew-symmetric) if F(a,y) = —F(y, a) for all 
a,y £ X. 

(iii) non-degenerate if “F(a,y) = 0 for all y E X ” implies a = 0. 

(iv) positive definite if F(a, a) > 0 for all a / 0. 

(v) negative definite if F(a,a) < 0 for all a ^ 0. 

(vi) indefinite if not either positive or negative definite. 

The most significant types of bilinear forms are among the non-degenerate 
ones. Specifically: 

(vii) A metric tensor on A is a symmetric non-degenerate bilinear form. 
We will often follow physicists’ practice in shortening this to just 
“metric”, despite a certain risk (VI.1.02) of confusion. 

(viii) An inner product on X is a positive or negative definite metric 
tensor (cf. Exercise 3). We shall always take it to be positive unless 
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otherwise indicated: there is no essential difference since a change 
of sign changes one to the other, without altering the geometry. 

(ix) A sympleciic structure on X is a skew-symmetric non-degenerate 
bilinear form. 

(We shall not be concerned with symplectic forms here, but they play a 
central role in classical mechanics. See for instance [Abraham and Marsden], 
[Maclane (1)], or for a brief exposition [Maclane (2)].) 

We denote the space of all linear forms on X by L 2 (X ; R) or ( L(X , X ; R) 
(cf. Exercise 4). 

If 5 is a subspace of X ) then F is symmetric/anti-symmetric/.../ 
symplectic on S according as the restriction 


SxS-4R:(2,y)h4 F(x , y) 


is symmetric/anti-symmetric/.. ./symplectic (cf. Exercise 5). 

It will often save writing to call a subspace on which a metric tensor is 
non-degenerate a non-degenerate subspace of X. 


1.02. Definition. A metric vector space (A, G) is a vector space X with a 
metric tensor G : X x X —> R. In particular: 

An inner product space (X 1 G) is a vector space X with an inner product 
G : X x X -+ R. 

For a given metric vector space (A, G) we shall often abbreviate G(x, y) 
to x • y and (A, G) to A, when it is clear by context which metric tensor is 
involved. 

We shall reserve the symbol G exclusively to metric tensors (including 
inner products). 

1.03. Definition. The standard inner product on R n is defined by 

n 

( I 1 ,...,i n ).(y 1 ,...,y B ) = xV + - + xY = £xV • 

1=1 

Notice that we cannot use the summation convention here, since both 
sets of indices are upper; * and y are both contravariant vectors. The summa¬ 
tion convention operates where we are combining something covariant with 
something contravariant, a(b) = cub' say, (cf. III.1.08): an operation which 
depends only on general vector space definitions, and has this formula with 
respect to any basis and its dual. An inner product or metric tensor is extra 
structure. To give it a nice formula we must have a basis nice with respect 
to it, as we indicated at the beginning of the chapter. We examine this more 
precisely in §3. 

The Lorentz metric on R 4 is defined by 


(x°,x 1 ,x 2 , x 3 ) ■ (y° ,y x ,y 2 ,y 3 ) = x°y° 


x 1 !/ 1 - x 2 y 2 - x 3 y 3 
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The indices run 0-3 instead of 1-4 by convention, for no particular reason 
except to make the odd coordinate out in the formula more distinctive. This 
metric originates in the physics discussed in Chapter 0.§3. For a single vector 

* = (a: 0 ,® 1 , i 2 , a; 3 ) 

it gives us 

x-x = (x°) 2 -(x 1 ) 2 -(xy-(x 3 ) 2 , 

a more systematic expression of the relativistically invariant quantity t 2 — 
x 2 — y 2 — z 2 that we encountered before. The analogy with the Euclidean 
\v\ • |iy| cos a, for the dot product of two distinct vectors, can be elaborated 
using cosh a instead. We explore some of the geometry behind this in Chap¬ 
ter IX.§6. 

Caution : some authors use x l y x + x 2 y 2 + x 3 y 3 — x 4 y 4 (essentially the 
negative of the metric above) as “the” Lorentz metric. And it is not cus¬ 
tomary in the journals to mention which has been chosen: you just have to 
work it out. We shall mention the differences made by this choice at the 
appropriate points. 

The determinant metric on R 4 is defined by 

* y = (x 1 ,x 2 ,x 3 ,x 4 )-(y 1 ,y 2 ,y 3 ,y 4 ) 

= |(xV + *V)-i(*V + *V) • 

For this metric 

i . \x l x 2 1 
x • x = det o A . 

. x x \ 

The determinant of n x n matrices for n / 2 is not associated in this way 
with a metric tensor on R n . But the particular case of n = 2 gives us, in 
Chapter IX, a short cut to an explicit example of indefinite geometry in Lie 
group theory that is important in itself. 

1.04. Definition. In a vector space X with metric tensor G:- 

The length \x\g of the vector x is y/x • *. (we shall suppress the G 
if only one metric is in question.) Notice that with an indefinite metric a 
non-zero vector may have positive, zero or imaginary length. For example, 
in the Lorentz metric, 

1(1,0,0,0)1 = 1, 1(1,0,1,0)1 = 0 , 1(0,0,1,0)1 = -v/—T. 

For this reason x -x is far more important than |sb| since the imaginary num¬ 
bers are adventitious; they obscure the essentially real (rather than complex) 
structure in use. 

In the situation of Chapter 0.§3, a vector labelled (1,0,0,0) or (—1,0,0,0) 
by some observer represents a separation purely in time from the origin, with 
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no difference in spatial position - according to that observer. These vectors 
have positive Lorentz dot product with themselves. On the other hand a 
point labelled (OjX^x^x 3 ) by someone, with a “purely spatial” separation 
from the origin according to this label, gives a negative number. Finally, pos¬ 
sible “arrival” points for light flashes with “departure” at (0,0,0,0) or vice 
versa give zero, by the Principle of Relativity (as we say in Chapter 0.§3). We 
shall see (Exercise 3.5) that the sign of x x completely determines whether 
the spatial or temporal part of the separation x can be eliminated by a suit¬ 
able choice of coordinates. Borrowing language from this case even when not 
thinking of physics we shall call x 

timelike if x • x > 0 
spacelike if * • x < 0 
lightlike or null if * • x = 0 . 

1.05. Examples. For the sake of the examples they provide, we introduce 
here the (non-standard because there are no standard ones) symbols H 2 for 
R 2 with the metric 

and H 3 for R 3 with the metric 

{X ,X , x ) • (y , y , y ) = x y - x y - x y . 

In H 2 the null vectors are all those of the form (x,x) and (x,— x). In 
H 3 the null vectors are those (x^x^x 2 ) with (x 1 ) 2 + (x 2 ) 2 = (x 0 ) 2 . Fig. 1.3 
shows * as null, y spacelike and z timelike in each diagram. For R 4 with 



Fig. 1.3 
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the Lorentz metric (which we shall call Loreniz space and denote by L 4 ) a 
similar picture is true but hard to draw. 

The set {x | x • * = 0} of null vectors is called the null cone or light 
cone of X: it is never a subspace of X with any non-degenerate indefinite 
metric (Exercise 6). 

1.06. Definition. A norm on a vector space A is a function 

X -> R : x h* ||*|| 

such that for all x,y £ X and a £ R, 

N i) ||*|| = 0 implies * = 0. 

Nii) |M = MN|. 

Niii) ||* + y|| < ||*|| + ||y||- 

A partial norm satisfies (N ii) but not necessarily (N iii), and only 
N’i) ||*|| > 0 for all x £ X 
instead of (Ni) (cf. Exercise 7a). 

On an inner product space ( X , G) we have a norm given exactly by 
length, |*| = ||*||g = +y/x • * (Exercise 7b) but for a general metric vector 
space y/x • * need not be real, so that this does not define a function X —► R. 
We can however define, for a metric vector space (X, G'), 

IMIc = +V|G'(*.*)I • 

If G' is an inner product this coincides with the length and we shall use | | 
and || || indifferently; in general || ||g' is a partial norm (Exercise 7c). 

In any metric vector space (X, G) we shall abbreviate || ||a to || ||, when 
possible without confusion. We shall call ||*|| the size of *, as against the 
length \x\. 

A unit vector * in a metric vector space is one such that ||*|| = 1. 

Any non-null vector * may be normalised to give the unit vector in 
the same direction. 

1.07. Lemma. In any inner product space (X, G) we have, for any x, y £ X 

* y < |*||»| 

with equality for x, y non-zero if and only ify = xa, for some a £ R (when 
the two vectors are collinear.) 

(This is obviously necessary to make possible the equation *,y = 
|x||y|cosa of the remarks opening this chapter. It is called the Schwarz 
inequality and it is false for || ||g when G is indefinite.) 
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Proof. For any a £ R, 

(xa - y) • ( xa - y) = xa xa — y xa — xa y + y y 
= (* • x)a 2 - (2x ■ y)a + y • y . 

Since G is positive definite, z • z > 0 for all *, and so in particular 

(* • x)a 2 — (2a? • y)a 4- y • y > 0 for all a. 

Therefore the quadratic equation 

(* • x)a 2 - (2* • y)a + y • y = 0 

in a cannot have distinct real roots, hence 

(2* • y) 2 - A(x ■ x )(y • y) < 0 . 

(x ■ y) 2 < (x ■ x)(y ■ y) 

l(* • y)l < Vx • Xy/jrV 

x V < 11*11 ||y|| 

In the case of equality, the equation has exactly one real root (a = = 

|*|) and for this value we have precisely 

(xa - y) • (xa - y) = 0 . 

Hence by the definiteness of G 

xa — y = 0 

xa = y . □ 

1.08. Definition. Two vectors as, y in a metric vector space are orthogonal 
whenever 

x y = 0 . 

If the metric is an inner product, this coincides with the Euclidean idea of 
“at right angles” (for which orthogonal is just the Greek) but in an indefinite 
metric vector space a null vector is orthogonal to itself. Fig. 1.4 shows several 
pairs of vectors in H 2 , with each matching pair orthogonal. 

For any x € X y the set x 1 of vectors orthogonal to it is a subspace of 
X , since if x • y = 0 = * • y' we have 

* • (y + y') = * • y + * • y' = o , aj(ya) = a(x • y) = 0 . 

(as 1 can be pronounced “x perp ”, from perpendicular.) This idea should 
be familiar for R 3 with the standard inner product (Fig. 1.5a); for H 3 it is 
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illustrated in Fig. 1.5b-f. The plane in Fig. 1.5d is a good example of a 
degenerate subspace larger than a null line. 

In a similar way, the set of vectors y for which x • y equals some given 
number a, is an affine subspace parallel to x 1 . (In the inner product case, it 
is the set of vectors “with component in the x direction” as in Fig. 1.6. 
Notice how this geometrical idea depends on orthogonality.) 

Via orthogonality, then, we can go from vectors in X to parallel slicings 
of X . We have in fact found a transfer from X to X*, since these slices, 
for x E X, are exactly the affine hyperplane “contours” (cf. III. 1.02) of the 
linear functional 

x* : X —:y x y . 

Similarly, given a function f we have a “gradient vector” for it: we can 
choose a vector * in the unique direction orthogonal to ker /, and with a 
length indicating how “steep” the functional is - how closely the contours 
are spaced. A metric tensor, then, gives us a geometrical way of changing 
from contravariant vectors to covariant ones and vice versa. As usual, the 
algebra gives us a grip on this (in the next theorem) which is useful in proofs 
and computations, but the geometry is the heart of the matter. 

1.09. Theorem. For any non-degenerate bilinear form F on a vector space 
X, the map 

: X - X* 

r ) 

V y^F(x,y)J 
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is linear and an isomorphism. 
Proof. For any x, x',y G A, a G R 


(■*!(* + *')) v = F (( x + *')> v) = *. v) + *V, v) = *H*)y + F A (x')y 

= (F i (x) + F i (xO)y 


So, 

And 


F i (x + * , ) = -Fl(*) + ^(*')- 


F i {xa)y = F(xa,y ) = aF(x,y) = a^xjy) = (aFj(x))y . 
So 

^(xa) = (F x (x))a . 

Hence Fj is linear. Since F is non-degenerate, 


■Fj(x) = 0 => F;(x)y = 0 for all y 
=$■ F(x,y) = 0 for all y 
=> x = 0 . 


Thus ker(F|) = {0}, so that 7i(Fj) = 0 (cf. 1.2.09). 
Hence by Theorem 1.2.10 and III.1.04 we have 


dim(FjA) = r(Fj) = dimA = dimA* . 

So, FjA = A*, since A* is the only subspace of itself with the same dimen¬ 
sion. So F| is an injective (Exercise 1.2.1) and surjective linear map, and 
hence an isomorphism by 1.2.03. (Notice, once again, that finite dimension 
is crucial). Continuity is involved in such infinite dimensional versions as are 
true.) □ 

1.10. Notation. The inverse of the isomorphism Fj will be denoted by Ft. 
In the sequel we shall make extensive use of Gj and Gj induced by a metric 
tensor G. 

1.11. Lemma. A non-degenerate bilinear form F on a vector space X in¬ 
duces a bilinear form F* on X* by 

*~(f,9) = F(Fi(f),F 1 (g)) 

(that is, change the functionals to vectors with Fj, and then apply F) which 
is also non-degenerate and is symmetric/anti-symmetric/... /indefinite (cf. 
1.01) according as F is. 

Proof We shall prove non-degeneracy, and leave the preservation of the other 
properties as Exercise 8. 

If for some f e X* we have F*(/, y) = 0 for all g G A*, this means 

^(*!(/)> Ft (y)) = 0 for all ye A*. 
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So, 

Hence 

and 


F(Fj(/),y) = 0 for all y 6 X (Fj surjective). 
Fj(/) = 0 (F non-degenerate), 

/ = 0 (F injective). 


Thus Fj is non-degenerate. □ 

1.12. Corollary. A metric tensor (respectively inner product) G on X in¬ 
duces a metric tensor (respectively inner product) G* on X*. □ 


Exercises IV. 1 

1. Prove by Euclidean geometry (no coordinates) that if for v, w geo¬ 
metrical vectors (directed distances from 0) in Euclidean space, with 
lengths v, tu, we define 


v • w = v w cos a 


where a is the angle between them, then 

a) v • w = w • v 

b) (va) • w = a(w • v) 

c) v • (u + w) = v • u + v • w. 

2. There are 30 possible implications, such as “(iv)=»(v)” among prop¬ 
erties (i)-(vi) in 1.01. Which are true? Which properties always con¬ 
tradict each other? 

3. The “dot product” v • w defined in Exercise 1 is an inner product. 

4. Addition and scalar multiplication of bilinear forms defined pointwise:- 


(F + F')(x,y) = F(x,y) + F'(x,y) 

for all x,y £ X 

(Fa)(x,y) = a(F(x,y)) 
make L 2 (X] R) a vector space. 

5. Which of properties (i)-(iv) in 1.01 must hold for F on any subspace 
of X if they hold for F on X? (Test by looking at, for instance, line 
subspaces.) 

6. In H 2 the null vectors do not form a subspace. Deduce with the aid 
of Theorem 3.05 that for no non-zero vector space with an indefinite 
metric tensor do the null vectors form a subspace. 
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7. a) Prove from 1.05 Ni), Nii), Niii) that if || || is a norm, then ||a;|| > 0 

for all *, so that || || is also a partial norm. 

b) Prove that if G is an inner product, then || ||g is a norm. (For Niii), 
expand \\x + y\\a and hence (||x + y||o) 2 , using the bilinearity of G .) 

c) If G is an indefinite metric, || ||g is a partial norm. Show that nei¬ 
ther N i) nor N ii) hold, by considering null vectors and sums of null 
vectors in, for example, H 2 . 

d) Show that if G is any symmetric bilinear form (in particular a metric 
tensor), G can be defined from | |g by means of the result that for all 
*i V 

G{x,y) = \G(x+y,x+y)-\G(x-y,x-y) = \{\x+y\ 2 G -\x-y\lf) . 

(This is known as the polarisation identity . It shows that the Eu¬ 
clidean, Lorentz and determinant metric tensors of 1.03 can be recov¬ 
ered from the corresponding (length) 2 and determinant functions . In 
other words, a quadratic form determines a symmetric bilinear form. 
For, if we know G(x,x) for all x G X, the above identity shows how 
to find G(x,y).) 

8. Prove that F* on X* (Lemma 1.11) is symmetric, skew-symmetric, 
positive definite, negative definite or indefinite according as F itself 
has or has not each property. 


2. Maps 

The isomorphism G± constructed above illustrates an unusual point about 
a metric tensor; usually, with any structure, we are interested only in func¬ 
tions that respect it (as linear maps do addition etc.). While maps having 
Ax Ay = x -y in analogous fashion are important (see below, 2.07) they are 
not the only maps we allow here; we still work with the whole class of linear 
maps. Of equal importance with maps preserving the metric tensor are those 
constructed by means of it, such as Gj and those we are about to define. 

One of the most frequent operations with a vector in conventional three- 
dimensional space, from the moment it is introduced at school level, is to find 
its component in some direction, or in some plane. (If a particle is constrained 
to move in some sloping plane, the in the plane - components of gravity and 
other forces that it experiences suffice to determine its motion). This idea 
generalises to metric vector spaces: 

2.01. Theorem. Let S be a non-degenerate subspace of a metric vector 
space X . Then there is a unique linear operator 

P:X ->S 
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Fig. 2,1 


(called orthogonal projection onto S) such that {x-Px)-y = 0 for all y E S. 
(Essentially Px is “the component of * in S” and x — Px is “the component 
of x orthogonal to S” and x is their sum.) 

Proof Existence: We define Px by exchanging * for the functional “dot 
product with x” , restrict that to a functional on S, and exchange the result 
for a vector in S, via the restriction of the metric. (Which is why we require 
the metric to be non-degenerate on 5, thereby inducing an isomorphism 
S —► S*.) Notice that although 5 is a subspace of X, S* is not a subspace 
of X * in a natural way: the dual of the inclusion * : S X goes the other 
way. We have 

i* : X* S* : f f\ s zz f oi . 

Formally, if G is the metric tensor on X and G' the induced metric tensor 
then, on 5, 

G' : S x S —► R : (*,y) G(*,y) . 

We set 

P = (G' T )o(,*)o(G i ). 



Then for any y E 5, 

(x-Px)-y = x y-G\(t*(Gix)) y 

= x ■y-t*(G i x)y (y G S) 

= x • y — (Gix)y (definition) 
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= x • y — x • y (definition) 

= 0 . 

Uniqueness: Suppose that we also have Q : X —* S such that 


(x - Qx) ■ y = 0 
Then by linearity of G, 

(* — Px - (x - Qx)) • y = 0 
(Qx - Px) y- 0 


for all y € S, x 6 X. 


for all y G S, x G X. 


But Px, Qx hence also Qx — Px, are in S, and G is non-degenerate on 5, 
hence Px = Qx for all x G X. □ 

2.02. Corollary. The projection operator P onto S is idempotent. (cf. 1.3.02) 

Proof. If y = Px for some *, then y 6 5. Moreover 


(y - Py) ■ v' = 0 


for any y ' £ S. 


But (y — Py) £ 5, and G is non-degenerate on 5, so 

y-Py 


therefore 

Px = P(Px) for any * £ X. 

□ 

It will be seen in the next section that a metric vector space possesses 
non-degenerate subspaces of all dimensions 0 < d < n. Several orthogonal 
projections are illustrated in Fig. 2.2: (a) represents a typical projection in 
R 2 with the standard inner product, (b) and (c) projections in H 2 . 




(b) 

Fig. 2.2 



ic) 
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Notice by comparing (a) and (c) how strongly the projection depends on 
the metric. 

2.03. Definition. The kernel of the orthogonal projection onto a non¬ 
degenerate subspace 5 of X (cf. Exercise 9a) is called the orthogonal com¬ 
plement of S in X, and denoted by S 1 . (We shall also call the orthogonal 
complement of the subspace spanned by one or more vectors simply the or¬ 
thogonal complement of the vector or set of vectors; like x 1 in 1.08.) 

2.04. Lemma. For any non-degenerate subspace S of X, each x £ X can be 
expressed, in a unique way, as 

x = s + t , 

where x £ S, and t £ S 1 -. (This gives an example of a direct sum X = 5® S 1 , 
cf. Exercise VII.3.1 a-d.) 

Proof. Set s = Px, t = x —Px, where P is the orthogonal projection onto S. 
Then we have s £ S, t £ S 1 by the definition of P, and 

x = s +1 . 

Uniqueness follows from that of P. □ 

2.05. Corollary. If dim S = k, dimX = n, then dim S' 1 = n — k. 

Proof. Exercise 9b. □ 

2.06. Corollary. If G is non-degenerate on S, it is non-degenerate on S ± . 
Proof. If x £ S 1 , then x • s = 0 for all s £ S. Therefore we have, for x £ S 1 


N 

II 

o 

for all t£ S 1 

=>x-i + x- s = 0 

for all s £ S, t £ S 

=>• x ■ (a + t) = 0 

for all s £ S, t£ S ± 

=> x ■ y = 0 

for all y £ X 

=> * = 0 . 

(G non-degenerate on X) 


□ 

2.07. Definition. A linear map A : X —► Y between metric vector spaces is 
an isometry if it is surjective and 

{Ax) • {Ax') = x • for all x,x f £ X. 

(That is, if it preserves the metric, just as linearity means “preserving addi¬ 
tion” with Ax + Ay = A(x + y) etc.) “Preserving lengths and angles” is 
evidently a strong condition: in particular it implies that A is injective as 
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well as surjective, and hence an isomorphism (Exercise 2c). If A preserves 
the metric but is not surjective, it is an isometry into Y (Exercise 2d). 

An operator on X which is an isometry is called unitary or orthogonal 
(This term arose when attention was more on matrices than on the operators 
they describe. It will be accounted for at the end of §3 below. See also 
Exercise 3.7.) 

In an inner product space, an orthogonal operator with positive de¬ 
terminant is simply a rotation since it preserves lengths and angles. The 
determinant condition ensures that the space is not “turned over”. For ex¬ 
ample, if A(x,y) = (—x,y) then A is not a rotation but it is an orthogonal 
operator on R 2 with the standard inner product. Orthogonal operators must 
take unit/spacelike/timelike/null vectors to vectors of the same sort, and can 
(Exercise 3.7) take a vector v to any w with w • id = v - v. Fig. 2.3 shows 
the surfaces consisting of the possible end points for images of a vector * 
under an orthogonal operator A, in various situations. For a more detailed 
discussion, see [Porteous], p. 427. 

For Lorentz space L 4 , an orthogonal operator is sometimes called a 
Lorentz transformation , but this term is often reserved for a change of or¬ 
thonormal basis (cf. next section) which involves the complications discussed 
in 1.2.08 and III.1.07. It is thus a good idea to avoid using the same term for 
both, and we shall stick to the second usage. (In space, rotating an object 
and rotating your axes for its description are both practicable. On the other 
hand, if L 4 is thought of as spacetime, then it is obviously hard physically 
to “move it around” by an operator, whereas relabelling is just a matter of 
changing how you look at it - or who looks at it. A choice of who is “at rest” 
is exactly a choice of x°-axis, since moving along this axis or parallel to it in¬ 



to) R? standard inner product 


(b)H, 5 x timelike 

Fig. 2.3 


(c)U? X spacelike 
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volves no change in “space” coordinates, only in time, which is what “at rest 
in a frame of reference” means: cf. XI.2.01.) 

2.08. Definition. The adjoint A T of a linear operator A on a metric vector 
space X is defined by the equation 

A t x y * Ay for all x,y E X. 

(The geometrical meaning of A T depends on the nature of A; if A is orthog¬ 
onal, for instance, A T coincides with A*~ (Lemma 2.09). In other cases, such 
as A a projection, A T coincides with A (Lemma 2.11). These two situations 
are the important cases.) Obviously, (A T ) T = A by the symmetry of the 
metric. 

The defining equation means exactly that 

(G l (A T x))y = (G l x)Ay . 

So, 

(G|(A T x))y = (A*(Gjx))y for all y G X. 

(definition of A*, III.1.03). Hence we have, 

G|(A t *) = A*(G;*) 

A t x = G T (A*(G i *)) . 

X* ^— X* 

°'i h 

at 

X -— X 
X X 

Therefore A T exists and is unique, being exactly the composite 

A t = Gj A*Gj . 

An operator A on A is self-adjoint if A T = A, or equivalently if Ax • y = 
x • Ay for all sc, y E X. 

Self-adjoint operators are very common and very useful (hence very im¬ 
portant) in a great variety of contexts, particularly when the vector space in 
question is infinite-dimensional and a lot of tools useful in finite dimensions 
no longer apply. Self-adjointness has a very straightforward geometrical in¬ 
terpretation, which - since it is not intimately related to the nice form such 
operators can be given in coordinates - we leave till after the next section. 
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2.09. Lemma. An operator A on a metric vector space is orthogonal if and 
only if 

a t a = i. 

Proof. For any x E X, 

Ax • Ay = x • y for all y E X 

<=> A t (Ax) y — xy for all y E ^(definition of A T ) 
(A t Ax - lx) • y = 0 for all y E X 

<==> A t Ax — lx = 0 by non-degeneracy, 

hence Ax • Ay = x • y for all x y y E X <=>► A T A = I. □ 

2.10. Corollary. An operator A on a metric vector space is orthogonal if 

and only ifA T is orthogonal □ 

2.11. Lemma. Orthogonal projection P onto a non-degenerate subspace S 
of a metric vector space X is a self adjoint operator. 

Proof. Let x = s + i, y = s f +t', with s, s' E S', i, i' E S x (cf. 2.04). Then 

Px • y = P(s + t)-(s' + t') = 8-(s' + t')=:8-8 f + 8-t f = 8-8 f 
and similarly 

x • Py = 8 • s' . □ 


Exercises IV.2 

1. a) The orthogonal complement of a non-degenerate subspace S of X is 

exactly the set { x E X | x y = 0 for all y E Y } of vectors orthogonal 
to all those in 5. 

b) If &i,..., 6* is a basis for a non-degenerate subspace 5, and ..., b\ 
is a basis for show that between them they span X. Deduce their 
linear independence from the uniqueness in 2.04, and hence that they 
form a basis for X. 

c) From (b), or from Theorem 1.2.10, deduce that dim(S) + dim(5 x ) = 
dim X. 

d) Suppose that an affine subspace S of X has d(S x S) a non-degenerate 
subspace V of the vector space T of X, for a given metric tensor on T. 
Then there is unique affine map P : X —► 5 such that ( d{Px , x)) y = 0 
for any x E X, y E V, and that P 2 = P. (We shall call this also 
“orthogonal projection”.) Show that P depends on the metric. 

2. a) In any metric vector space, for any operator A, det A T = det A. 

(Consider the determinant of the matrix of A T = G^A*Gi in any 
basis, using Theorem 1.3.08, and apply Chapter III, Exercise 2.) 
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b) Deduce via Lemma 2.09 that if A is orthogonal then 

det A = ±1 . 

c) Deduce that any isometry is an isomorphism. 

d) Deduce that any isometry into is injective and an isometry onto its 
image. 

3. What are the pictures for H 2 corresponding to those in Fig. 2.3b,c for 
H 3 ? Find the equations in coordinates of the surfaces shown and of 
the curves you draw. 

4. From the polarisation identity (Exercise 1.7d) 

Ax • Ay = x • y, V*,y <<=> Ax • Ax = x • x } V* . 

So A is orthogonal (preserving lengths and angles in the Euclidean 
case) if it preserves “dot squares” alone. Does preserving the size || ||o 
imply orthogonality? 

3. Coordinates 

3.01. Metrics. For a metric vector space (X, G), how do we write G, Gf, 
etc. in coordinates? 

Choose a basis /? = &i,... ,6 n . Then we have coordinates for vectors. 
Suppose we know for any two basic vectors 6,*, bj what G(6,*, bj) is. For two 
general vectors, x = (x 1 ,..., ar n ), y = (y 1 ,..., y n ) in these coordinates, we 
have x = x‘6,-, y = xp bj. Now we use the bilinearity of G. 

G(x,y) = G(x'bi,yi bj) 

= x'G(bi,yibj) (linearity in 1st variable) 
= x , y J G(fci,6j) (linearity in 2nd variable) 

Thus if we define 

9ij — G(6j,6j) 

we have the formula 

G(*,V) = • 

There are two lower indices for the components (Exercise la) gij of G, since 
G is covariant “twice over”: it is a map from two copies of X , where a 
covariant vector is a map from one. 

Notice that whatever basis we choose, we have gij = gji in the corre¬ 
sponding representation of G, since by definition a metric tensor must be 
symmetric. 

3.02. Duality. Since G| is an isomorphism it takes a basis j3 = fei,..., fc n 
for X to a basis Gj/? = G|6i,..., G|6 n for X*. Using these two bases, its 
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matrix is of course just [GjL 1 = [6y], the identity matrix, which is nice 
and simple. However, the advantages of the dual basis /?* defined without 
reference to a metric (III. 1.07) still apply, and it is in general handier to 
use (3 *. In this basis, the j-th component of any / 6 X* is the value of f on 
bj. In particular, for instance, 

3rd component of G|(*) = (G ;*)&3 

= G(*,6 3 ) 

= 9ij x% y* where y> 

= gax' . 

Similarly for the other components, so 

<n(*) = {gnx 1 ,...,ginx n ) 

= gijX^b 1 , where b 1 ,... ,b n is the dual basis, 
= gijX 1 for short. 

(We shall often shorten the symbols for indexed quantities in this fashion. 
Thus, as indicated in 1.2.07 we may call a vector x' instead of (a: 1 ,... ,x n ), 
and similarly a metric tensor just gij , etc.) 

So the matrix [Gjj^ is just 

■011 012 •. 01n " 

021 

-0nl • • • <7nn - 

and the matrix [G]^. is its inverse, whose entry in the t-th row and j-th 
column we denote by g'i. Then the g'i are defined by the n equations 

9 l, 9jh = (or g ki g ij = g{ , etc.) . 

Moreover, if we have x,y 6 X*, Xi, yi in dual coordinates to /?, then 
(cf. 1.11, 1.12):- 


/ 0 ,i#3 
\1 , * = 3 


G*(a5,y) = G(G T *,G|j/) 

= Gig^xtbi^'y,^) 

= 9ij(g ik xk)(g j, yt) 

= (9ijg i, )g ik x k yi 

= S‘g ik x kyi 

= 9^ x kVi , since obviously g kI = g lk . 
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= g^Xiyj , changing dummy indices. 

So the components of G* in dual coordinates are exactly the g'*' s. 

When we are working in coordinates, it will obviously be a help to choose 
them so that these formulae become as simple as possible: that is, we want 
the matrix to have a nice simple form. To this end we define: 

3.03. Definition. An orthogonal set in a metric vector space A is a subset S 
of X any two of whose members - x, y say - are orthogonal and non-null: 
aj-aj/O^y-y, *-y = 0. (This implies that S must be linearly independent; 
cf. Exercise 2.) 

An orthonormal set in X is an orthogonal set of unit vectors (cf. 1.06). 
An orthonormal basts for A is a basis which is an orthonormal set. 

3.04.Lemma. For (3 = 6i,...,6 n; an orthonormal basis for X } in (3- 
coordinates we have 

gij = ±6ij . 

Proof. 

n .-h-.h-f 0 > if * # i (P orthogonal) 

3,1 * 3 \ ±1 , if i = j. (fci,... ,b„ unit vectors) 

3.05. Theorem. Every metric vector space (A, G) possesses at least one 
orthonormal basis . 

Proof. First we need a technical lemma: 

3.06. Lemma. X possess at least one non-null vector. 

Proof. Suppose not. Then x • * = 0, all x € X. Hence 


So 


(y + *) • (y + *) = 0 , 


all y,z € X. 


yy + 2yz + zz = 0 


all y,z eX. 
(since z y = y z) 
y. z = -±(yy + zz) 

= 0 . 


Thus G is the zero form, which is completely degenerate and hence not a 
metric as assumed. 

Hence for G a metric, X has at least one non-null vector. , □ 


Now by the lemma, choose x\ £ X such that *1 • *1 ^ 0 and normalise 
by setting 



(cf. 1.06) 


L PAyAtcJ. 



86 


IV. Metric Vector Spaces 


Then suppose inductively that n > Jr > 1 and fci,..., 6* is an orthonormal 
set, (the vector &i on its own is, so we have proved it for Jr = 1). Let B* be 
the subspace they span. G is non-degenerate on B* since if * E B*, x = ar*6j 
with i = 1,..., Jr, so that if x y for all y £ B*, in particular x % = x *6,- = 0 for 
each i. Hence x = 0. Then dim(Bjf-) = n — Jr ^ 0 and G is non-degenerate on 
(2.05, 2.06). Hence by the lemma, we may choose Xk+\ € B% non-null 
and set 


bjb+l = 


x k+l 


a unit vector orthogonal to each of 6i,..., 6*. 

Inductively, this produces an orthonormal set of n vectors, which must 
(Exercise 2) be linearly independent and hence a basis for X. □ 


3.07. Remark. For convenience we shall always order an orthonormal basis 
6 n so that 


bi b,= 



, i < Jr 
, i > Jr 


for some k , putting the “timelike” vectors first, as with the Lorentz met¬ 
ric (1.03). This gives us the standard formula 


■y = x 1 y 1 + 


■ + *V-**+Y +1 - 


~ n *. n 
x y 


which G will have in any orthonormal basis, with not even k depending on 
the choice of basis: 

3.08.Theorem. For any two orthonormal bases /? = and /?' = 

6'i,..., for a metric vector space (X, G), with 


bi • bi = 



we have k = /. 


, i < k 
, i > k 


and V r V s 


/ +1 , i < / 
l-l > J > 1 


Proof If k = 0 or n then G is definite and Jr = /, so suppose that 0 < Jr < n. 
On the subspace N of X spanned by fc*+i,... ,6 n , G is negative definite, 
since if x & N 

G(x,x) = G(x i bi e+i ,x i bie + i) 

n—k 

= • 
i= 1 


Consider any subspace IF of X on which G is positive definite, and 
choose a basis u = tui,... ,w r for W. Then the set 


r > &*+l,•• • >h n } 


is linearly independent. For suppose 
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a l wi H-h a r w r + a r+1 b*+i H-h a r+ ( n = 0 . 

Then we have 

* a* to, = — a r+J 6*+j 

and hence 

** (a*w») * (a'wi) = (-a r + j b k +j) • (- a r + j b k +j ) . 

But G is positive on W*, negative on N, hence both sides of ** are zero, since 
LHS > 0, RHS < 0. Hence both sides of * are zero, by the definiteness of G 
on N and W. Hence a* = 0, i = 1,..., r+(n —Jb), by the linear independence 
of u and /?, so P is indeed independent. 

P therefore must have < n members, since an independent set cannot 
have more than dimX members. Hence 

dim W = r < k . 

In particular, on the subspace spanned by {&!,...,&{} G is positive 
definite, hence l < k. 

But by a similar argument, k < /. 

Hence k = /. □ 

3.09. Corollary. The quantity gu = Jb(+1) + (n — k)(—l) = 2fc — n is 
independent of which orthonormal basis is used to give G in coordinates. □ 

This quantity is usually called the signature of G; it specifies fc, by 
k = 2 9ii + n )> an d 18 gi ye n hy a shorter formula than k in terms of the 
components of G (cf. Exercise 6). 

3.10. Corollary (“Sylvester’s Law of Inertia”). For any symmetric bilinear 
form F : X x X —► R, there is a choice of basis for which F has the form 

F(x 1 bi+- ■ +x n b n ) = (z 1 )^-• •+(z*) 2 -(z t + 1 ) 2 -(x*+') 2 ,* + /<». 

Unless Jb or / is zero, the subspace V + spanned by the basis vectors with 
bi • 6, = +1 depends on the choice of basis; so does the subspace V~ spanned 
by those with fe f - 6, = —1. However, F 0 , spanned by those with 6,- • 6,* = 0, 
depends only on F, as do the dimensions k and /. 

Proof The set V N = { * | F(x , y) = 0, Vy G X } is a subspace of X since 
F(x 1 y) = F(x',y) = 0 Vy implies, by linearity, 

F(x + y) = F(x, y) + F(x', y) = 0 


and 
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F(\x,y) = XF(x,y) = 0 . 

Choose basis vectors for V N and extend to a basis bi,...,b n for X, where 
6, + 6 n are the basis vectors for V N . Hence dim = n-i. 

Denote by W the subspace spanned by Now let w G W. 

Then: 

F(w,v) = 0 Vt> G W 

implies that: 

F(w,x% + • • • + x%) + x i+1 F(x,b i+1 ) + ■■■ + x n F(w, b n ) = 0 
for all (a: 1 ,..., a;"). 

Using the bilinearity of F it follows that 


/(to, x'bi + ■•■ + x n b n ) = 0 VC* 1 , 

=> F(w, x) = 0 V*€X 

=> to e v N . 

But to G W so to = 0 because {6j,..., 6 n } is independent. Therefore F|iy x w 
is non-degenerate and we can apply Theorem 3.05. Thus we replace bi,...,b n 
by the orthonormal basis resulting for W and we obtain the required form 
for F. 

The independence of k and / of all except F itself follows by the same 
argument as Theorem 3.08. Evidently any other expression of F in the form 
above has f>,-+ 1 ,... ,b n € V N , and being linearly independent and n — i in 
number these vectors span V N . Accordingly the V° given by any basis of 
the required kind has V° = V N , whose definition involves only F. □ 

3.11. Lemma. The dual basis /?* = b 1 ,..., b n to an orthonormal basis fi — 
&ii • • • > b n for (X, G) is orthonormal in the dual metric G* on X *. 

Proof. 

(3 is orthonormal 
G(bi,bj) = ±6ij 

= M for short 

(up to order along diagonal) 


•$=>• [sf* J ] = M 

□ 


(using 3.01) 
is orthonormal. 


\9ij\ = 


1 0 
0 -1 
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3.12. Corollary. The signature ofG* equals the signature ofG. (Exercise 3) 

□ 

3.13. Lemma. If A is an operator on an inner product space (X, G), then 
(in the notations 2.08, III. 1.06) 

[-*'% = (Wj)' 

with respect to any orthonormal basis (3 = ..., 6 n . 

Proof. ( Gib{)bj = 6 ,* • bj = 6{j = 6 *( 6 j), hence G|( 6 j) = 6 *. 

Hence [G is the identity matrix /, hence so also is [Gj ]^*. 

Thus as matrices, 

[A T f p = [G^A*G{fp = [G$.[A%.[G{f; 

= 1[A%I = [A*# = ([A]$)‘ by III.1.07. 

□ 

When G is indefinite, the matrix of the adjoint is still closely related to 
the transpose. However, if aj is a term giving the image of a timelike basis 
vector bj a component along a spacelike basis vector 6 ,*, then we have a sign 
change in the i aspect from Gj but not in the j aspect from Gj, so the sign 
of a*j changes. This is a vague statement: the precise one, of which 3.13 is a 
special case, follows. 

3.14. Lemma. If A is an operator on a metric vector space ( X , G) then with 
respect to any orthonormal basis 6 i,..., b n we have 

(Xl = (Si)[AK 

for each i,j (with no summation). 

Proof. Let [A]J = aj. We illustrate the case for i = 3. 

(Abj) ■ b 3 = (djbic) ■ b 3 
= a j {bk • 63 ) 

= aj(fc 3 • 63 ) since 6 * • 63 = 0 , ib ^ 3 

= [A]f (63-63) • 

Hence, 

r.Tii _ (A T 63) ' bj 
[ ]a bi bi 

bi ^ 
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{A'Gdh))* 

bi-bi 

GMAbj) 
b b { 
h • Abi 
bi-bi 

a?(b 3 fc 3 ) 



by the equation above 
not summing over i. 


Thus if any operator Q on a metric vector space of dimension n and 
signature a has a matrix, subdivided into 

A 

_ C 
^i(<r + n) 
columns 

in an orthonormal basis with the first |( <r+n ) 6j’s timelike, then the adjoint 
has the matrix 


- 

- 

A' 

-C 1 

-B* 

D* J 


T 


I 


|(<r + n) rows 


3.15. Corollary. An operator A on a metric vector space is self-adjoint if 
and only if with respect to an orthonormal basis its matrix [aj] satisfies 

= ( « ) 4 

for each i,j (without summation). □ 

With an inner product space, this means simply that [a}] is symmetrical 
about the diagonal; hence in this case self-adjoint operators are often called 
symmetric operators. With an indefinite metric the matrix is “symmetric” 
in some parts and “skew-symmetric” in others, and it is more natural to stick 
to “self-adjoint” (cf. 2.08). 

3.16. Lemma. An operator A on an inner product space is orthogonal if and 
only if with respect to an orthonormal basis it has a matrix whose columns 
(respectively t rows) regarded as column (respectively , row) vectors form an 
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orthonormal set in the standard inner product. (This is also true for metric 
vector spaces, and little harder to prove (Exercise 4).) 

Proof. 

A is orthogonal <=> A T A = I (2.09). 

<=> [A T ]f[A]; = «* 

=E 

i =1 

which is exactly the statement that the columns of [A] are an orthonormal 
set. 

For the “rows” version, apply Lemma 2.10. □ 

3.17. Corollary. The rows of a matrix are an orthonormal set (in the stan¬ 
dard inner product) if and only if the columns are. □ 

This fact, rather impressive and magical at the “matrix theory” level, 
was the reason for the term “orthogonal” matrix, and hence orthogonal oper¬ 
ators. Since the columns are exactly the coordinates of the images under A of 
the vectors in the standard basis, which is orthonormal in the standard inner 
product, 3.16 amounts geometrically to the statement that A preserves the 
metric for all vectors if it does so for basis vectors. This is true for any basis, 
by linearity, but not so simply stated in coordinates for a non-orthonormal 
basis. 


Exercises IV.3 

1. a) Define a basis Af ,J for the vector space L 2 (X\ R) in terms of that 
chosen for X such that 

G = 9ij M « 

where the </,j are those defined in 3.01. (Thus the are compo¬ 
nents of G in the vector space L 2 (X; R), as the aj are for an operator 
A G L{X\X): cf. 1.2.07. These two cases are specialisations of more 
general definitions given in Chapter V.) 
b) If G has components gij in the basis f3 , and a new basis consists of 
the vectors 6 * = (bj ,..., 6 "), i = 1 ,..., n, in ^-coordinates, derive the 
formula 



for the components of G in /^'-coordinates. 

2. If { 6 i,..., 6 r } in (X, G) is an orthogonal set (so 6 ,* • bj =0 if and only 
if i ^ j), and * = a* 6 , = 0, deduce that x • bj = 0 for j = 1 ,..., r and 
hence that each a* = 0 . 
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3. Deduce Corollary 3.12 from the proof of Lemma 3.11. 

4. Prove Lemma 3.16 for metric vector spaces. 

5. Show, by the method of proof of Theorem 3.05, that for any x in 
4-dimensional Lorentz space with x • x > 0 (respectively * • x < 0) 
there is a choice of coordinates giving x the form (t, 0 , 0 , 0 ) (respec¬ 
tively ( 0 , ar, 0 , 0 )) and giving the metric the standard expression of 
Definition 1.03. 

6 . Show that {(1,0,0,1), (0,1,-1,0), (1,0,0,-1), (0,1,1,0)} is an or¬ 
thonormal basis for R 4 with the determinant metric (cf. 1.03). Find 
the signature of this metric tensor (cf. 3.09). 

7. a) Show, by the method of proof of Theorem 3.05, that any unit timelike 

vector v can be a member of an orthonormal basis v, 62 ,..., b n . 

b) Deduce that for any two unit timelike vectors v, w there is an orthog¬ 
onal operator A with Av = w. (Construct bases as in (a), use 1.2.05 
to find A then establish its orthogonality.) 

c) Prove the same thing for v, w unit spacelike vectors. 

d) Prove the same thing for two null vectors. (Hint: an orthonormal 
basis indicates how to write a null vector as s + i, with spacelike s 
and timelike t separately moveable.) 

e) Deduce that for any two vectors v, w there is an orthogonal operator 
A with Av = w if and only if v • v = w • w. 


4. Diagonalising Symmetric Operators 

It is convenient to have an orthonormal basis, so that the matrix of </ #J * ’s has 
a simple diagonal form. Likewise, if for some operator A we can find a basis 
consisting entirely of eigenvectors of A, then [A] will be very simple. For, if 
bi belongs to A,*, (with Ai,..., A n not necessarily distinct) we have 

A(x 1 ,...,x n ) = A(x'bi) 

= x'(Abi) 

= x\\ibi) (by 1.3.07) 

= (Ai* 1 ,..., A„a: n ) 

so that the matrix of A is just 

[Ai 0 ] 

^2 

.0 A n _ 
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Fig. 4,1 


This is algebraically convenient and geometrically clear: it breaks A 
down into scalar multiplication in various directions. This simplification, 
however is not worth it if the basis of eigenvectors is not orthonormal in the 
metric we are using: it is more helpful to have the metric in a simple form 
than to simplify an operator. The great advantage of symmetric operators, 
self-adjoint operators on an inner product space, is that we can have both, 
as we establish in the next few lemmas. The idea involved is as follows. 

If we have an operator A for which there is an orthonormal basis of eigen¬ 
vectors, which is in fact true if A is symmetric, the unit sphere { * | x-x = 1 } 
is taken by A to an ellipsoid with the eigenvectors along the principal axes. 
Examples for two and three dimensions are shown in Fig. 4.1. There the —► 
arrows represent unit eigenvectors, and the circle/sphere represents the unit 
sphere, carried by A to the arrows and the ellipse/ellipsoid. 

So we can find the eigenvectors, starting with the one(s) belonging to 
the largest eigenvalue, by looking for the biggest vector(s) in first the whole 
ellipse, then in slices at right angles to the eigenvectors we have found al¬ 
ready. There is one complication: the operator A(x,y) = (2x,—2 y) on R 2 , 
for example, takes the unit circle to a larger circle, in which all of the vectors 
are equally biggest, but not all eigenvectors of A. They are, however, eigen¬ 
vectors of A 2 , and it turns out that by an algebraic trick (Lemma 4.03) the 
eigenvectors of A 2 easily lead to those of A. 

4.01. Definition. If A is any operator on an inner product space X, a vector 
x is maximal for A if * is a unit vector and 

Ax • Ax > Ay • Ay 


for all unit vectors y E X. 

For X finite dimensional, it turns out that A must always have maximal 
vectors. For * i-> Ax • Ax is a continuous real-valued function on the set 
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of unit vectors, which is closed and bounded in X (which can be taken as a 
copy of R n here). Hence its maximum value exists and is attained, as proved 
below in Chapter VI 4.12. (This is essentially a topological fact, which is why 
we must defer its proof till we have the right machinery. The proof will not 
depend on the diagonalisability of symmetric operators, so we are not being 
circular.) 

If x is maximal for A, then ||A*|| = max{ Ay • Ay | y € X, y • y = 1 } 
is often called the norm of A and denoted by ||A||. For all y, then, we 
have ||Ay|| < ||A|| • ||y||. (Normalise y, apply the definition of ||A||, and 
denormalise; Exercise 1.) 

4.02. Lemma. If x is a maximal vector of a symmetric operator A on an 
inner product space X, then x is an eigenvector of the operator A 2 , belonging 
to the eigenvalue ||A|| 2 . 

Proof 

||A|| 2 = ||Aaj|| 2 = Ax • Ax (definition of ||Aa;||) 
= A 2 x • x (A symmetric) 

* < ||A 2 a;|| \\x\\ (Lemma 1.07) 

= ||A 2 aj|| (a; a unit vector) 

= WMAx)\\ 

< IW| IIA»|| 

<||A||(||A|||H|) 

= ||A|| 2 . (a; a unit vector) 

So all of the inequalities must actually be equalities in this case, since they 
are squeezed in between equal quantities. But in the Schwarz inequality *, 
equality only holds if A 2 x = xa for some a £ R. Thus x is an eigenvector 
of A 2 , with eigenvalue 

a = a(x • x) = (xa) • x = (A 2 a;) • x = ||A|| 2 . □ 

4.03. Lemma. A symmetric operator A on an inner product space has an 
eigenvector belonging to an eigenvalue +||A|| or — ||A||. 

Proof Take a maximal vector x of A. Then by 4.02, 

A 2 * = x||A|| 2 , 

(A 2 - \\A\\ 2 I)x = 0 . 

Hence 

(A+||A||J)(A-||A||J)» = 0. 

Hence either (A — ||A||I)a5 = 0, in which case * is an eigenvector of A 
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belonging to the eigenvalue +||A||, or not, in which case (A - ||A||I)x is an 
eigenvector of A belonging to the eigenvalue —||A|| (cf. Exercise 2). □ 

4.04. Lemma. Ifx is an eigenvector of a self-adjoint operator A on a metric 
vector space, then 

x • y = 0 => x • Ay = 0 . 

That is, A(* ± ) C x 1 , so that y Ay defines an operator on x x : that 
induced by A. (This is not true for A not self-adjoint (Exercise 3), nor very 
useful if x € x 1 .) 

Proof If x belongs to the eigenvalue A, then 

x • y = 0 =$► X(x • y) = 0 (xA) • y = 0 => Ax • y = 0 => x • Ay = 0 . □ 

4.05. Theorem. If A is a symmetric operator on an inner product space X, 
then X has an orthonormal basis of eigenvectors of A. 

Proof Let dim AT = n. 

Find an eigenvector x\ by Lemma 4.03, and set 



to give a unit eigenvector. Then by Lemma 4.04 we can consider the induced 
operator 

A' : b 1 —* b 1 : x i-> Ax . 

This is again symmetric, with respect to the restriction of the inner prod¬ 
uct. Hence we can find a unit eigenvector 62 of A', which is then also an 
eigenvector of A and orthogonal to 61 since it is in . Inductively, we may 
continue this process, each time producing a unit eigenvector orthogonal to 
all the previous ones. Since we decrease the dimension of the space (to which 
we apply 4.03) by one each time, we produce exactly n eigenvectors before 
we run out of space. These are an orthonormal set of n vectors, hence inde¬ 
pendent and a basis. □ 

4.06. Corollary. A can be represented by a diagonal matrix with respect to 
an orthonormal basis . □ 

4.07. Corollary. If pi is a root of multiplicity m of the characteristic equation 

det(A - AI) = 0 

then the eigenspace belonging to pi has dimension m. (This applies as a gen¬ 
eral result only if the metric is an inner product and A symmetric; cf. Exer¬ 
cise 5.) 

Proof Represent A, via an orthonormal basis 61 ,... , 6 n , by a diagonal ma¬ 
trix with entries the eigenvalues Ai,..., A n of A (not necessarily distinct or 
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non-zero). Then 

det(A - AJ) = (Ai - A)(A 2 - A)... (A n - A) 

so that n is a root with multiplicity m if and only if A ; - = fi for exactly m j y s, 
A j t = Xj 2 = • • • = A j m = /i, say. Then the subspace spanned by 6 ; i ,..., bj m 
is exactly the eigenspace belonging to /i. □ 

4.08. Corollary. All the roots of the characteristic equation of A are real 

□ 


4.09. Corollary. In an inner product space ( X , G) for any symmetric bilin¬ 
ear form h on X we can find an orthonormal basis bi,...,6 n for X such 
that 


h(b iy bj) = 0 




Proof. For x G X define 


h m : X -*• R : y i-» h(x,y) . 

Next define 

Ah : X —* X : x >-+ G t (fc.) . 

Then we have 


A h x ■ y = (G 1 (A fc *))|/ = /»as(y) = 

= h(y, x) = (x) = Ay • x = x ■ Ay , 


since G and h are symmetric. Choosing by 4.05 an orthonormal basis 
6i,..., b n which diagonalises A we have 

h(bi,bj) = Abi • = A,(6 t - • bj) = 0 if i ^ j. 

(Unless fc is non-degenerate, some of the eigenvalues A t * of A will be zero.) □ 

4.10. Definition. The 6 t * of 4.09 are called principal directions of h . Notice 
that if det(A — AJ) has coincident zeros, the directions are not uniquely 
defined. For if 6i and 6 2 both belong to A, then do so b[ = &i + b 2 and b' 2 = 

bi - b 2 . Thus ...,b n is an orthonormal basis diagonalising 

A and h equally well. 

If they are completely indeterminate, that is if A has one eigenvalue A 
to which all of X belongs, h is called isotropic . 

4.11. Lemma. If h is isotropic, then h = AG for some A E R and the 

corresponding A = AJ. □ 

4.12. Remark. Theorems 4.05, 4.09 are not (unlike for instance the exis¬ 
tence of orthonormal bases, 3.05) true whether G is definite or indefinite. 
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For, if A : H 2 —► H 2 : (x,y) (x — y,x + y), A is self-adjoint and 
/^((a?,y), (ar ; ,y 7 )) = A(x,y)-(x',y') = xx'-x'y-xy*-yy' is correspondingly 
symmetric. But A has no eigenvectors and h has no principal directions. (Ei¬ 
ther look at the characteristic equation of A (cf. 1.3.13) or draw pictures - 
preferably both; cf. also Exercise 5.) We shall need: 

4.13. Lemma. If a self-adjoint linear operator A, on Lorentz space L 4 
(cf. 1.03), has a timelike eigenvector v, then L 4 has an orthonormal basis 
of eigenvectors of A. 

Proof By 4.04, A restricts to an operator on v x , which is non-degenerate 
by 2.06. By the arguments of the proof of 3.08, G is negative definite on v x , 
and thus an inner product. Hence 4.05 applies and we have three spacelike 
orthonormal eigenvectors for A\ v ±. These, with v, provide the required basis. 

□ 


Exercises IV.4 


1. Prove the inequality \\Ax\\ < ||A|| ||z|| of 4.01. 

2. Draw the pictures corresponding to the two possibilities involved in the 
proof of Lemma 4.03, in the case of the operator A(x, y) = (2x, — 2y) 
on R 2 . What difference would it make if we factorised (A 2 — ||A|| 2 I) 
as(A-||A||J)(A + ||A||I)? 

3. Give an example of an operator A, on R 2 with the standard inner 
product, having an eigenvector x = (0,1) and such that A(x x ) ^ x 1 . 

4. If as, y are eigenvectors of a symmetric operator belonging to eigen¬ 
values A ,ji, with A / j/, then 

X(x • y ) = /i(x • y) 


Deduce that eigenvectors belonging to distinct eigenvalues are orthog¬ 
onal. (cf. Exercise 1.3.7 for the non-symmetric case.) 


5. a) 


b) 


If A:R 2 


R 2 has the matrix 


' 1 
-1 


1 

3 


, show that det(A — XI) = 0 


k j 

has 2 as a root with multiplicity 2, but the eigenspace belonging to 2 
has dimension only 1. (Draw it!) 

Show that A is self-adjoint as an operator on H 2 (1.05). 
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“In that One Void the two are not distinguished: 

each contains complete within itself the ten thousand forms.” 

Seng-ts’an 


1. Multilinear Forms 

Starting with a vector space X , we already have several spaces derived from 
it, such as the dual space X * and the spaces L 2 (X; R) and L 2 (X*] R) of 
bilinear forms on X and X *. We shall now produce some more. Fortunately, 
rather than adding more spaces to an ad hoc list, all as different as, say X* 
and L 2 (X; R) seem from each other, our new construction gives a general 
framework in which all the spaces so far considered occur as special cases. 
(This gathering of apparently very different things into one grand structure 
where they appear as examples is common in mathematics - both because it 
is often a very powerful tool and because many mathematicians have great 
difficulty in remembering facts they can’t deduce from a framework, like 
the atomic weight of copper or the date of the battle of Pondicherry. This 
deficiency is often what pushed them to the subject and away from chemistry 
or history in the first place, at school.) 

We start by defining a generalisation of bilinear forms. 

1.01. Definition. A function 

f:X 1 xX 2 x-xX n ^Y 

where ..., X n , Y are vector spaces, is a multilinear mapping if 

(i) f(xi,Zi+x'i, = f(xi, ...,Xi,..., x„)+f(xi ,...,x 

(ii) /(*!,...,*,• a,..., x n ) = (f(xi,...,xi,...,x n ))a 

for any »i G X\, G X n and x[ G X { , ..., x'„ e X n , i € {1,..., n}, 

a G R. The vector space (cf. Exercise 1) of all such functions is denoted by 

L(X u ...,X n ;Y). 

n times 

In particular, denote L(X,...,X;Y) by L n {X\Y). 
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If / € L n (X; R), / is called a multilinear form on X. (Notice that 
L l (X\ R) = L(X] R) = X* y and that L 2 (X; R) has already been intro¬ 
duced (IV. 1.01) under exactly this symbol.) 

1.02. Examples. 

(i) Let X be a three-dimensional space and /(a 5 i,* 2 >* 3 ) be the volume of 
the parallelepiped determined by x\ y *2 and 053 (with the negative vol¬ 
ume if they are in “left-hand” order as in Fig. 1.1). Euclidean geometry 
will show that f is multilinear on X (Exercise 2a). 

(ii) If we define 

/ : L(W;X) x L(X;Y) x L(Y; Z) -+ L(W; Z) : (A,B, C) ^ CBA 

then / € L(L(W; X),L(X;Y), L(Y; Z ); L(W; Z )). 

(iii) If we define 

f:XxX*-+R:(x,g)^g(x) 

then f is linear in the first variable by the linearity of each g £ X* y in 
the second by the definition of addition and scalar multiplication in X*. 
It is thus a (highly important) vector in L(X,X*;R). 

(iv) If a?i,..., x n are n specified vectors in X and we define 

f : X* x X* x - x X* -+R 

(91,92, ■ ■ ■ ,9n) ^ ffl(*l)f/2(*2) • • 9 n (x n ) 

then we have f £ L n (X*; R), by a straightforward check. 

Dually, if g\ ,..., g n are n specified linear functionals on X and we 
define 

g : X x X x - x X -+R 

(xi, * 2 , •••,*«) ffl(*l)ff2(*2) • • • ffn(*n) 



Fig. 1.1 
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then we have g E L n (X;R). Notice that we may in this way “multiply” 
01 ,..., g n , but that we get a higher-order multilinear form by doing so, 
not just another functional. If we tried to define a “product functional” 

0102 - Qn by 

(0102 • • -0n)« = g\{x)g 2 {x) . . .0n(a) 
then we would have 

(0102 • • - 9 n)xa = agi(x)ag 2 (x) ... ag n (x) 

= a n g\{x)g 2 (x) ... g n (x) 

= a n (0i02...0n)* 

which is clearly not linear, since a n ^ a in general, and so our “product” 
is not a functional. The g we have defined lies in a higher space; we call it 
therefore the tensor product g\ ®g 2 ® ® g n of < 71 ,..., g n to distinguish 
it clearly from all the ordinary products, in many situations, which take 
two or more objects and give another of the same type. (The term inner 
product is never abbreviated to “product”, for the same reason.) 

1.03. Tensor Products. We have a map 

<g> L(X u ...,X ni R) 

(01,0*, • • • , 0n) 01 ® 02 ® • * • ® 0n 

where 01 ® 02 ® ® 0 n is defined exactly as above, which is evidently 

multilinear (Exercise 3). It is not, however, surjective in general. This is 
most easily seen in an example: 

<g>:(R 2 )* x (R 2 )* —► L 2 (R 2 ;R) 

takes a pair /,0 of linear functionals on R 2 to a bilinear form (x,y) »-+ 
/(*) 0 ( 0 ) on R 2 - But such a form cannot for instance be non-degenerate. 
For if we choose a non-zero x E ker /, which is always possible (by 1.2.10) 
since dim(ker/) > 1 , then f(x)g(y) = 0 for all y E R 2 . 

However, any bilinear form can be expressed in terms of its effects on 
basis elements, in the manner of IV.3.01, for any basis 61 , b 2 of R 2 ; 

F(x,y) = l f, ((* 1 i* 2 )i(y 1 ,y 2 )) = F(x 1 b\ + x 2 62,y x 61 + y%) 

= x 1 y 1 F(b 1 ,b l ) + x 1 y 2 F(bi,b 2 ) + x 2 y 1 F(fe 2 , &i) + x 2 y 2 F(b 2 ,b 2 ) 

by bilinearity of F 

= x'y 3 F(bi, bj ) using the summation convention. 

Setting F(b{, we have 
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F{x,y) = fijx V 

= f ij (b i (x))(V(y)) 

= (f ij b i ®V)(x,y). 

Hence 

F = fob 1 0 V 

so that the four tensor products b l 0 b 7 span the vector space L 2 (R 2 ;R). 
(Note that 6 1 0 6 2 ^ 6 2 0 6 1 ; tensor products do not commute.) 

Thus the tensor products / 0 g in L 2 (R 2 ;R) do not constitute a vector 
space, since they are not closed under addition - we may have /0g+/ , 0flf / / 
/" 0 g" for any /" and g n in (R 2 )*. This is sad, as a vector space structure 
is too useful willingly to do without. However, if we formally put the sums 
in with them, subject to the linearity conditions we already have, we get 
a vector space which is essentially L 2 (R 2 ;R) back again. The construction 
involved is formal nonsense 1 of the type this book is dedicated to omitting, 
and the fact that the result is naturally isomorphic to a space we have already 
set up lets us avoid it with no ill consequences. 

So: the natural notion of calling the set of all tensor products 
g\ ® 02 ® • • • 0 g n the tensor product of the spaces XJ,... ,X* is unsat¬ 
isfactory, because this is not a vector space. However it sits naturally inside, 
and (Exercise 4a) spans, L(X i,..., X n ; R) which is a vector space. (Remem¬ 
ber that (fa) 0 g is to be identified with, because it is the same function as, 
/ ® (flfa).) This is then a good candidate for the tensor product: we set 

Xf0X 2 *0..-0X; = L(Xi,...,X n ;R) . 

This has the following two properties :- 

Ti) 0 : Xf x XJ x ••• x X* —► XJ 0 X£ 0 • • • 0 X* is multilinear. 
(Exercise 4b) 

T ii) If / : XJ x X£ x • • • x X* Y is multilinear, then there is a unique 
linear map 

f :X1®X;®---®XZ~>Y 

such that / = / o 0. (Exercise 4d) 

Diagramatically: 

x^xx;x - xx* 

f \ / f commutes. 

y 


1 a respectable pure-mathematical technical term. 
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These two properties between them pin down the tensor product com¬ 
pletely, and permit us to define it for any set of vector spaces. Discarding 
whatever in the above is motivation rather than proof: 

1.04. Definition. A tensor product of vector spaces X \,... ,X n is a space 
X together with a map 


0: X x x X 2 x • • • x X n -+ X 
having properties Ti) and Tii). 

1.05.Lemma. A tensor product of X i,...,X n always exists, and any two 
are isomorphic in a natural way . 

Proof Existence: We have shown existence for the spaces XJ*,..., X*, since 
L(Xi ,... ,X n ;R) does the job. But X, S (X**)* naturally, so L(XJ,... ,X*;R) 
will serve for Xi,..., X n . 

Uniqueness: If X,X', with maps 0,0' both have the properties Ti) 
and Tii), then: 

X x x X 2 x • • • x X n 

® / \ 



By Ti) for (A,0) and Tii) for (X',0') there exists a unique IP such that 

* 0 ' = 0 . 

By Ti) for (X',0') and Tii) for (X,0) there exists a unique # such that 


#0 = 0'. 

Hence 

&& 0 = * 0 ; = 0 = I x 0 . 

But by Ti) for (X,0), 0 = J0 is multilinear, hence by the uniqueness 
in Tii) for (X, 0), this means that = Ix . Similarly, = I*/, so that 
X^X'. □ 

1.06. Language. We shall not go into the technical justification here, but 
the isomorphism of the theorem is in the strongest sense natural (like that of 
X with (X*)* but not with X*, unless X is allowed extra structure such as a 
metric); it is clear that it involves no arbitrary choices. As a consequence we 
may without confusion use it to regard all tensor products of X\ ,..., X n as 
essentially the same, and talk of the tensor product. We always denote this 
by Xi <8> X 2 ® ® X„, and its elements will all be sums of scalar multiples 

of tensor products: 
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(*! ® x 2 ® * * • ® x n )a + (*i ® *2 0 • • • ® xjja' H-(finitely many terms) 

where 6 X,-, a, a',... E R, satisfying the equations 

T A) *1 ® • * • ® (x t - + xj) ® • • * ® X n = 

«1 0 • • • ® X, ® • • • ® 3 n + *1 <® • • • ® ® • • • ® X n 

TS) (sCid)®^®* • -®X n = • *®X n = Xl®«2®- * *®(x n a) = 

(xi ® *2 ® • * • ® x„)a. 

(cf. Exercise 5). This description of the elements could be used to set up 
the vector space X\ ® X 2 ® • • • ® X n directly, but this would involve more 
definitions. Since by Lemma 1.05 any construction giving something with 
properties Ti) and Tii) produces a result essentially the same as any other, 
we have chosen a quick one that uses only the tools ready to hand. From now 
on, the construction can be forgotten: Ti) and Tii) characterise the tensor 
product on spaces, T A) and TS) on vectors, and these between them suffice 
for all proofs and manipulations (reduced if need be to coordinate form). 

Notice that an element of X ® Y need not be of the form x ® y\ it 
may be a sum of several such. It need not be a sum in a unique way. For, 
(Exercise 6a): 

* <8> y + x' ® y = (*3 + x'\) <g> (y + y') + ((x + *') ® (y + y')) 2 • 

Vectors in a tensor product of spaces which can be expressed as a single 
tensor product of vectors are called simple tensors ; those which can only be 
expressed as a sum, compound. Note that since the simple tensors span the 
tensor product, a linear map is entirely fixed by the values it takes on them. 

1.07. Lemma. There is a natural isomorphism X% ® XJ ® • • • ® X* ^ 

(X\ ® x 2 ® • • • ® x n )\ 

Proof. Define 


# :(Xi® X 2 ®---® Xn)* -L(Xi,...,X»;R) = X? ®XJ ® - --®X; 


by 

* : / ^ /°< 8 > 

(where 0 : X\ x X 2 x • •• x X n -+ X\ ® X 2 ® • • • ® X n as above) and 
* : L(X 1 ,..., ; R) - ( X x ® X 2 ® • • • ® X n )* : g ^ g . 


Here g is uniquely given for each g by Tii). 

Evidently $ is linear, and IP is an inverse function for it, so that # is 
a bijection and hence an isomorphism (1.2.03). (Hence, of course, 9 also is 
linear.) 
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Naturality we shall as usual take to follow from lack of special choices 
involved, since we do not want to go into the technicalities of category theory; 
they are indeed here technicalities only, and this isomorphism is another that 
may safely be used to identify two spaces. 0 

This result, and our techniques for its proof, illustrate the usefulness 
of the tensor product: it swiftly reduces the theory of multilinear forms on 
a collection of spaces to that of linear functionals on a single space - their 
tensor product. We thus do not need to do all our work on functionals over 
again for multilinear forms. 


1.08. Lemma. For any two vector spaces X \, X 2 there is a natural isomor¬ 
phism 

L(X 1 ;X 2 )->Xi®X 2 . 


Proof. Define 


f .X{ xI 2 -4 L(X i; X 2 ) 
(»,* 2 ) *-*• (*1 *2(/(*l))) 


Then / is multilinear (Exercise 7a) and so by Tii) induces a linear map 


f:X{®X 2 ^L(X u X 2 ) with /® = /. 

Now 

f{g <8> *2) = 0 => f(g, * 2 ) = 0 

=> * 2 (ff(*i)) = 0 for all *1 € Xi 

=^*2 = 0 or g = 0 
=> g®x 2 = 0 . 

Hence / is injective (Exercise 7b), and since 


dim{Xl 0 X 2 ) = dim*!* dimX 2 
= dim Xi dim X 2 
= dim(L(X i; X 2 )) 


(Exercise 4c) 
(III. 1.04) 
(by 1.2.07) 


it is an isomorphism, as required. □ 

This is a very important and very useful isomorphism (thanks again to 
naturality). It is far more often helpful to think of L(X\ Y) than of X* 0 Y . 
We shall generally identify the two, just as we identify (X*)* and X. 

1.09. Tensor Products of Maps. If we have linear maps A,* : Xi —► Yi, 
i = 1 ,..., n then the composite map 

Xi x • • * x X n -► Yi x • • • x y„ -► Yi 0 • • • 0 Yn 

(A lt ...,A n ) < 8 > 
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(*1, . . . , X n ) H+ (AiXi, . . . , A n X n ) h-* AiXj ® • • • ® A n X n 
is multilinear (check!) and hence induces by T ii) a unique linear map 


A\ ® A 2 ® • • • ® A n : X\ ® X 2 ® • • • ® X n —)► Yi ® Y 2 ® • • • ® Y n , 


with, on simple elements, 


Ai ® A 2 ® • • • ® A n (xi ® *2 ® * * * ® *n) = A1X1 ® A 2 X 2 ® • • • ® A n X n 


This is called the tensor product of the maps Ai,..., A n . 

1.10. Notation. The most important cases of tensor products of spaces are 
of the type 

X ® X ® • • • ® X ® X'- <g> X ' ¥ ® • • • ® X* 

v — ■ .—v 11 / N . - . ^— - ' 

k times h times 

for some particular space X. This is denoted by X *. Vectors in xi are 
called tensors on X, covariant of degree h and contravariant of degree Jfc, or 
of type (£). We abbreviate X* to X*, Xj to X h . 

Evidently, X = X 1 and X* = X\. 

By convention Xfj = R. 

By Exercise 4c, dim(X*) = (dimX)* +/l . 

Sometimes such a space arises in a less tidy sequence, such as 


X®X®X*®X®X*®X* 

(for instance, as the tensor product of two tidy ones, X ® X ® X* and 
X ® X* ® X*; cf. Exercise 8a) and while it is legitimate (Exercise 8b) to 
reshuffle them if we wish, it may be inconvinient - x ® x' could then mean 
one thing before the shuffle and another after. So, for example, the space 

x®x®x®x*®x*®x®x®x*®x®x 

will be denoted by X 3 2 2 i 2 . Its elements will then be covariant of degree 
2 + 1 = 3, contravariant of degree 3 + 2 + 2, and of type ( 3 2 2 i 2 ). 

Since tensors on X are simply vectors in a space constructed from X, 
we shall denote them by symbols x, y, etc., of the same kind (modified by 
the habits we have already got into of /, y, etc., for functionals, G for a 
metric tensor, etc., when convenient). Various notations such as bold sans- 
serif capitals are in use, dating from the days when tensors were thought 
mysterious and impressive, but this is unnecessary. Moreover, the borderland 
between those who never use anything but “indexed quantities” and those 
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who reserve fance type for much fancier objects is fast disappearing. So we 
shall not worry the typist . 2 

1.11. Contraction. For any mixed tensor space, X 3 \ l 2 for example, if we 
choose one copy of X* and one of X , say the 3rd X and the 2nd X * for 
definiteness here, 

1 i 

x®x®x®x*®x®x*®x* 

then we can define the corresponding linear contraction map 

G* 2 :X®X®X®X*®X®X*®X* ^X®X*®X®X* 
on simple elements by 

£2(Z1 ®X 2 ®X a ®fi®X 4 ®f 2 ® fa) = (*1 ® * 2 ® fl ® *4 ® f a )f 2 (x a ) 

(cf. Exercise 9). The image under this map of a tensor * on X is called 
a contraction of x. We can distinguish from a component (1.12) by 
the presence of the twiddle. We omit the suffixes when possible without 
ambiguity.) 

A contraction map lowers both covariant and contravariant degree by 
one. If the original degrees are equal, successive contractions define a map 
right down to Xfj = R, but not uniquely. (There are k\ possible total con¬ 
tractions X* —► R, according to how we pair off the X’s and the X*’s, and 
k\ = 1 only if k = 0 or 1 ). 

1.12. Components. By Exercise 4, if 61 ,... , 6 „ is a basis for X then the 
set of all tensors of the form 

bi ® bj ®b k ®b l ®b m , 

where i, j, jfc, /, m are (not necessarily distinct) labels drawn from { 1 , 2 ,..., n}, 
is a basis for X 3 . (Exercise 4 is one of the chief examples of a careful check 
that is essential to do, but almost worthless merely to see done. Like a Zen 
exercise, you must experience it to gain anything. It should not be hard 
unless you have completely lost sight of what is going on, in which case you 
should return to the earlier chapters - or find a better book - rather than 
wish for this manipulation to be in the text.) 

Thus, for any * £ X 3 , x is a unique linear combination (Exercise 1.1.8) 

* = (bi ® bj ®b k ®b‘® b m )x)% 


2 Thanks a lot. The typist. 
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and we may represent x by its n 5 components x \If we wish to change 
basis, to & 1 ,..., 6' n say, then in the notation of 1.2.08 and III.1.07, by precisely 
similar arguments, we have new components 


* 


40 ' = 


,x 


lm 


where 


b' p = (bibi-r P ) 


in the old coordinates, for p (and so also i, j, etc.) = 1,..., n, and 


6f 6* = 6 •' , etc. 

This is the traditional definition of a tensor of type ( 3 ) as “a set of n 5 
numbers that transform according to the equation As it stands, that is 
frankly meaningless; you can transform any set of n 5 numbers by that for¬ 
mula. A better expression of this approach is, for instance “a covariant tensor 
of order 3 is a rule which in any coordinate system allows us to construct n 3 
numbers (components) Xij*, each of which is specified by giving the indices 
i, j ) k definite values from 1 to n such that the results in two different bases 
are related by the formula 


x i 'j'x* — 

[Shilov], and so forth for other types. This is then logically satisfactory. The 
reader must decide whether for him it is more illuminating than the approach 
we have chosen. 

We too say “and so forth for other types”, since the completely gen¬ 
eral rule would have to replace i,j, it,... by t’i,..., i p , so the formula would 
involve terms like We shall only add that tensors of type, say, ( 3 i X 2) 
are represented by components labelled by expressions of the form x^ k i m npi 
transforming according to 


* 4i * V m n'P> = m % 4C'K>K')* ijk i m np . 

Notice that if 

(K ® • • • ® ® ® • • • ® ) € x£, 

and ™ ®---®b k -)ex‘ m , 

then 

v®w = ® • • • ® b ih ® b ai ® • • • <g> b bm ) e X k h ‘ m “ . 

So the components of v<g>w are simply all the possible products of components 
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of v and w . We shall sometimes follow the physics literature practice of 
referring to a tensor, x E X s i 1 2 , say, by its “typical component” x^ k \ m np 
when we have a fixed basis or chart in mind: this makes some room for 
confusion, but physics students need to get used to it. 

Occasionally when we have names for the coordinates, rather than num¬ 
bers (such as (x,y } z) not (a: 1 , a: 2 , a? 3 ) on R 3 , which is sometimes convenient 
in saving indices) we shall let the names stand for the numbers: like t xy z 
instead of t\ §. This has to be used with caution, owing to the summation 
convention - t xy z means something quite different if a: is a dummy index. 

Contraction has a simple formula in coordinates. On a basis vector 
6, <g> bj (g> 6* ® b l ® 6 m of X <8> X ® X <g> X* <g) X* } for instance, the ef¬ 
fect of “contracting over j and /” (that is, applying is to take it to 
b i ®b k ®b m (b l (b j )). Now, 

b‘(b j ) = 6 l j 

by definition. So the image in X\ of 

* = (bi ® bj <g> bit <S> b 1 ® b m )x\i£ G Xf , 

under contraction is a vector whose component along each basis vector 
bi' <g> bjt' ® b m of X? is the sum of those having j = l and i = i', 
k = k',m = m'. Thus Q\x has coordinates precisely SjX = xV£, using the 
summation convention. 

The naturality of the isomorphism f of Lemma 1.08 is illustrated by its 
form in coordinates. If we have a & X{ ® X^, with 

a = V ® b'id) 

with respect to bases b \,..., b„ for Xi, b \,..., b' m for X?, then 

(fa)x = (&'(&,(*))) a} 

= ( b i*’)a) 

= b'i(aWj). 

Hence, 

(/«)(**» ...,*") = (a]x i ,a]x j a^x*) 

and the matrix of fa is exactly [a}]. So whatever bases we choose for X\ 

and X 2 , they give the same representation for fa as for a. 

Notice that f carries the contraction function Q : X* ® X —► R to the 
trace function tr : L(X;X) —* R, since Qa = a\ = tr(/a) in any coordinate 
system. As / can safely be used to identify the two spaces, this gives a more 
intrinsic and coordinate-free way of thinking about the trace than we had 
in 1.3.14, but it remains heavily algebraic. 
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(If A is thought of as an “infinitesimal operator”, using the differential 
structure of L(X;X) then tr A becomes an “infinitesimal change of determi¬ 
nant” , We shall discuss a precise formulation in a later volume in the context 
of Lie groups and their Lie algebras, or see [Porteous]. That allows a geo¬ 
metric interpretation of trace, exploited implicitly in IX .§6 below, but not a 
universally applicable one.) 

1.13. Tensors on Metric Spaces. Suppose that X has a metric tensor G. 
Then isomorphisms 

G t :X-+X* , Gt 

give rise to isomorphisms between various of the spaces constructed from X. 
For example, 

I ® / ® G j ® Gj ® Gf ® I ® Gj : 

x®x*®x*®x®x*®x®x*-+x®x*®x®x*®x®x®x 

and so forth. In general we have an isomorphism 

preserving the order in which tensor products are taken, whenever 

k + h = *' + h! . 

If the metric has been fixed once and for all, these can be used, for instance, 
to make all tensors entirely contravariant. This might seem a simplification, 
but it is not. For example, velocity at a point arises as a contravariant 
vector. The gradient of a potential at a point arises as a covariant one 
and the contours of the functional (cf. III. 1.02 and VII. 1.02) are the local 
linear approximation to those of the potential. Similar things happen for 
higher degrees. So it is better to keep dual objects distinguished, using the 
isomorphisms when convenient, rather than let them merge into the One 
Void: the goal of physics is not Nirvana. 

The formulae for these isomorphisms come straight from those for Gj 
and Gj (IV.3.02). In general, let A : X —► Y and A 1 : X 1 —► Y' have matrices 

[aj] and [a'f] with respect to bases { 61 ,... {b[ . b' n }, {cx,...,c m } 

and {ci,... jcjn) for X , X', Y, Y f respectively. Then for * £ X ® X 1 we 
have 

A ® A\x) = A ® A'ibj ® btf 1 ) 

= (Abj ® A f b\)x 

= (teojjaWka '?))** 1 

= (ci ® 4 )a}a / fx J ' . 
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So the nn' x mm 9 entries of [A ® A'] are just the multiples aja'f, and so on 
for the higher orders. Thus, for instance, the isomorphism 


0 = J®I®G T ®G i ®G T ®J®G T :a5h^y 


at the beginning of this section has the formula 


.ik* m'np 1 
Vj V 


_ n rn'm 

- 9 9V19 



m 


n 

P * 


Application of these isomorphisms is known as “raising and lowering 
indices”, for obvious reasons. This lies behind our notations Gj and Gj. 

One of these isomorphisms is significant enough to merit special mention. 
It gives us the composite 


9 : L{X ; X) —- X* <g> X —► X* ® X* = L 2 (X; R) . (cf. 1.03) 

/— j 

Here / is as in Lemma 1.08, so we have an isomorphism between the space 
of operators and that of bilinear forms. If A € L(X,X) has matrix [aj] and 
F = 9A has components /*;, then we have the formula 

fkl = 9kid) • 

In fact, is most clearly represented otherwise by the formulation (Exer¬ 
cise 10a) 

L(X;X)->L 2 (X;R):A^ [(x,y) *-> Ax-y] . 

In this form it is easy to prove (Exercise 10b) that A is non-singular if and 
only if!PA is non-degenerate, and that A is self-adjoint if and only if 9A is 
symmetric. 

This equivalence makes it seem that perhaps the separate proofs for 
the diagonalisation of symmetric operators (IV.4.06) and of the symmetric 
bilinear forms (IV.3.05; only the “ortho normal” condition on the basis vectors 
requires non-degeneracy) were superfluous, and that one should be deducible 
from the other straight off. However, one involves a basis orthonormal with 
respect to G, the other a basis orthonormal with respect to SPA, which can be 
any bilinear form at all (9 being surjective) so that the two are not closely 
related. Moreover, if G is indefinite IV.4.06 is false (IV.4.12) but IV.3.05 
remains true, so no close relation can be expected. 


1.14. Geometry. The reader may have noticed a scarcity of pictures in this 
chapter. This is not because tensors are un-geometric. It is because they are 
so geometrically various. They include vectors, linear functionals, metric ten¬ 
sors, the “volume” form 1.02(i) (cf. also Exercise 11) and nearly everything 
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else we have looked at so far. That all of these wrap up in the same algebraic 
parcel is a great convenience, but it does mean that geometrical interpreta¬ 
tions must attach to particular types of tensor, not to the tensor concept. 
We shall provide such interpretations, as far as possible, as we proceed. 


Exercises V.l 

1. Define addition and scalar multiplication of multilinear maps, by 
analogy with IV. Exercise 1.4 for the bilinear case, and prove that 
L(Xi,...,X n ; Y) is then a vector space. 

2. a) Prove that f as defined in 1.02(i) is multilinear, via Euclidean “base x 

height” arguments on the volume of parallelepipeds, 

b) Prove from the definitions of addition and scalar multiplication of 
maps that f as defined in 1.02(ii) is multilinear. 

3. Prove from the definitions that the map 0 of 1.03 is multilinear. 

4. a) By choosing bases for X\ ,..., X n , show that the set 

0(X 1 * x X 2 * x • • • x X*) spans L(X U ... ,X n ;R) . 


b) Check Ti) in 1.03. 

c) Prove that the set of all possible tensor products of the form 

&«! ® bi 2 ® • • • ® b in , 

where each 6 tj is a vector in the basis chosen in part (a) for Xj, is a 
basis for Xf 0 XJ ® • • • ® X* = L(X U . .. ,X n ; R). 

(Essentially, the argument is the same as for the case Xi = X 2 = R 2 
of 1.03.) 

Deduce that dim(X*^X^ • -®X*) = dim(XJ) dim(X 2 *)... dim(X;). 

d) By examing its necessary values on basis elements, prove the existence 
and uniqueness of the map / in Tii) of 1.03. 

5. Prove that the tensor products of functionals defined as in 1.02 
and 1.03 satisfy T A) and T S) of 1.06; deduce that the tensor products 
of vectors do likewise. 

6. a) Prove from equations T A) and TS) that 

((*2+ * # 2)® (y + i/O) + ((*- * 0 ® {y-y'))\ = x®v + x' ®v' ■ 
b) Prove that if x ® y = a' ® y', that x = a*', y' = ay for some a 6 R. 
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7. a) Prove that if we define (f(g, * 2 ))*! = * 2 (fl(*i)), *» G X{, then 


f(g + 9', *2) = f(g, *2) + /(»', *2) 
f(g, *2 + *2) = f( 9 , *2) + /(ff, *2) 

/(ffa,*2) = (/(ff, *2))a = /(ff,*20), 


as linear maps 

X 1 -+X 2 . 


b) Show that any finite sum 

t = g®x + g '&)*' + ••• 

is equal to a similar expression with all the x, x\ ..., linearly in¬ 
dependent. Deduce that if f(g ®se) = 0=^y®a5 = 0, than 
/(<) = 0=M = 0, so that / is injective. 

8. a) Prove from T i) and T ii) that 


{X\ <8> • • • ® J£ n ) ® (Yi ® • • • ® Y m ) S (X\ ® • • • <8> X n ® Yi ® • • • <g> y m ) 


b) Prove that for any permutation m (Chapter 1.3.06) if we define 
M : Xi ® X2 ® • • • ® X n -> X mi ® X ma ® • • * ® X mn 
on simple elements by 

M(x 1 ® aj 2 ® • • • ® x n ) = aj mi ® x m2 ® • • • ® aj mn 

than M is well defined and an isomorphism. 

9. Check that contraction is well defined, in that tensors equal by T A) 
and T S) go to equal tensors. 

10. a) Prove that if is the composite isomorphism from 1.13 

L(X\X) -> X* ® X —+ X* ® X* = L 2 (X ; R) 

then for any A : X —► X, we have x/^A(x 1 y) = A* • y. 
b) Prove that ifrA is non-degenerate (respectively, symmetric) if and only 
if A is non-singular (respectively, self-adjoint). 

11. A multilinear map f : X x ••• x X -+Y is skew-symmetric if 

u,..., 


whenever we fill in the empty spaces, for u>v in any positions. (A 
linear functional is regarded as skew-symmetric and symmetric, triv¬ 
ially). 
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a) The set of skew-symmetric fc-linear forms on x is a vector space. We 
denote it by A k X. (That it is a subspace of T* X not T k X is to do with 
the cultural barriers between mathematicians and physicists: A k X is 
largely used by mathematicians, thinking of the meanings for “co-” 
and “contravariant” that (III.1.03) we have chosen to avoid.) 

b) If (6 1 ,6 2 ,6 3 ) is a basis for X*, then a basis for A 2 X is 

(6 1 ® 6 2 - 6 2 ® b\b x ® 6 3 - fc 3 ® b\b 2 ® 6 3 - fc 3 ® b 2 ) . 

c) Find a general way of writing a basis for A k X, where X * has the 
basis (fe 1 ,..., 6 n ). (The notation of 1.3.06 should help.) Deduce that 
dim(yl fc X) = (*), the number of combinations of k things chosen out 
of n. In particular dim(yl n X) = 1 and dim(yl fc X) = 0 for k > n. 

d) Since A n X is one-dimensional, for any A E L(X;X) the operator 

<g) n A* = A*® •••® A* : X* ® • • • ® X* —► X* ® • • • ® X* 

n times 

restricted to A n (prove that we can so restrict it by showing that 
0 n A*(/) is skew-symmetric when / is) is just scalar multiplicar 
tion by some scalar c(A). Show that if 6i,...,6 n is any basis for 
X, / E ^l n X non-zero, and A an operator on X, than c(A) = 
f(Abi,Ab n )/f(bi,b n ). 

e) Find c(A) explicitely, and deduce that c(A) = det A. Thus det A is 
“what A does to skew-symmetric n-linear forms.” 

f) Why is “skew-symmetric n linear form” the natural notion of “volume 
measure” on an n-dimensional vector space? (cf. 1.02(i), 1.3.05 and 
Exercise 2) 

12. If we have non-zero f E ^l n X, g E A n Y , for X, Y n-dimensional 
and A : X —► Y, define det(yA//) = flf(A6i,..., A6 n )//(6i,... ,6„), 
where f3 = (6i,..., 6 n ) is any basis for X. 

a) Show that this definition is independent of /?. 

b) How could the definition be made without reference to a basis? 

c) Show that if we choose bases &i,... ,6 n for X, ci,... ,c n for Y such 
that /(fci,...,6 n ) = 1 = flf(ci,...,Cn), then det(gA//) is given by 
the usual formula from the corresponding matrix for A. 
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“That which gives things their suchness 
Cannot be delimited by things. 

So when we speak of “limits” we remain confined 
to limited things.” 

Chuang Tzu 


1. Continuity 

When we use logarithms for practical calculations, we rarely know exactly the 
numbers with which we are working; never, if they result from any physical 
operation other than counting. However if the data are about right, so is the 
answer. To increase the accuracy of the answer, we must increase that of the 
data (and perhaps, to use this accuracy, refer to log tables that go to more 
figures). In fact for any required degree of accuracy in the final answer, we 
can find the degree of accuracy in our data which we would need in order 
to guarantee it - whether or not we can actually get data that accurate. 
The same holds for most calculations, particularly by computer. Errors may 
build up, but sufficiently accurate data will produce an answer accurate to 
as many places as required. (The other side of this coin is summarised in the 
computer jargon GIGO - “Garbage In, Garbage Out”.) 

On the other hand, suppose our calculation aims to predict what is 
going to happen to a spherical ball B of mixed U 235 and U 238 in a certain 
ratio A: and that for this shape and ratio, theory says that critical mass 
is exactly 9^ kg. Assume we have found the mass of B to three significant 
figures as 9.25 kg. Now 9.25 kg is the mass “to three significant figures” but 
that means exactly that it could be up to 5 x 10~ 3 kg more or less than 
9\ kg precisely . And depending on where in that range it is, we have either 
a bomb or a melting lump of metal, and we cannot calculate which from our 
measurements. If we knew the mass more accurately as 9.250 kg, to four 
significant figures, we would have the same problem. Around the critical 
mass, no degree of accuracy in our knowledge of the mass (even ignoring the 
fact that at a really accurate level everything becomes probabilistic anyway), 
will guarantee that the energy output is within the laboratory rather than 
the kiloton range. The accuracy of our computed answer breaks down in 
spectacular fashion. The function 

/ : R —► R 

where f(x) is the energy output in the next minute of such a ball of uranium 
of mass s, is discontinuous at the critical mass. Our useful general ability to 
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guarantee any required level of accuracy in the answer by reducing possible 
errors in the data far enough does not apply here. Around any mass definitely 
less than the critical one (by however little) we can get an answer close to 
what happens if we can reduce our measurement error to less than that little; 
similarly with a definitely greater mass than critical. This is not possible 
around the critical mass itself. Thus / is continuous everywhere except at 
the critical mass, both in the intuitive sense and according to the following 
definition. 

1.01. Definition. A function / : R —► R is continuous at x £ R if for any 
positive number e (however small) there exists a positive number 8 such that 
if 

\y - x\ < 6 then | f(y) - /(x)| < e . 

(Notice the requirement that 8 must not be zero; zero would always 
work, since 


l*-y| = 0^*-y = 0=>* = j/=>- f(x) = f(y) => |/(x) - f(y)\ < e , 

but from where could we obtain infinitely accurate data? It is in fact a 
theorem that to get them would take infinite energy.) 

The use of e and 8 in this context are among the most standard notations 
in all of mathematics (to the point where the word “epsilontics” has been 
coined for complicated continuity proofs). Which symbol is used where, can 
be remembered by the observation that e is the maximum allowable error in 
the end result of applying /; that condition we can satisfy by making the 
error in the 6ata less than 8. The fun of the game lies in the diversity of 
continuous functions for which 8 depends intricately on e. 

If for some choice of e (and hence for all smaller choices) no such 6 exists, 
/ is by definition discontinuous at x. There may be such a 8 for some e (such 
as e > | for / as indicated by the graph in Fig. 1.1) but continuity requires 
that for each e there must be a 8. (Cf. also Exercise 1) 

This definition generalises immediately to a much wider context, with 
the help of a further term: 
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1.02. Definition. A metric (or distance function) on a set X is a function 

d.XxX R 

satisfying 

i) d(x,y) = d(y,x) 

ii) d(x, y) = 0 if and only if x = y 

iii) d(x, z) < d(x, y) + d(y, z). 

Plainly these are reasonable properties for a “distance from x to y” 
function. Condition (iii) is called the triangle inequality , since the lengths 
of the sides of plane triangles give the most familiar examples. Notice the 
differences from II.1.01. 

The pair (A, d) is a metric space : as usual for a set-plus-structure we 
shall often denote it by just X where we can do so without confusion. 

Every set X has the trivial metric: d(x,x) = 0, d(x, y) = 1 if x / y. 
Whereas a metric may be defined on any set, a metric tensor is defined 
only on a vector space. Part of the connection between the two is indicated 
in Exercise 2, but some of it must wait until we consider manifolds. 

A function g : X x X -+ R that satisfies (i) and (iii), but, instead of (ii), 

only 

(ii)* g(x, x) = 0 for all x y but g(x,y) may be zero for x / y 

is a semimetric or pseudometric , and ( X , g) is a semimetric or pseudometric 
space . Thus, every metric is a semimetric. Moreover, every semimetric has 
a non-negative image in R, as may be seen by putting x = z in (iii) and 
using (i). 

Caution: Normally it causes no confusion when “metric tensor” is ab¬ 
breviated to “metric”. However, (Exercise 2) a definite metric tensor gives 
rise to a metric in the sense just defined. In this context it is safer (but 
not usual) to refer to the latter as a nontensorial metric, to emphasise the 
distinction. A similar situation occurs on Riemannian manifolds (cf. IX.3.10 
and 4.03). 

1.03. Definition. A function / : X —► Y between metric spaces (X, d) and 
(y, d f ) is continuous at x £ X if for any 0 < e € R there exists 0 < 6 £ R 
such that if d(x , y) < 6 then d'(/(a:), f(y)) < e . 
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Fig, 1.3 


If / is continuous at all x £ 5, where S C X, / is continuous on S. If 
S = X, we just call / continuous. 

(Notice that this coincides with 1.01 when R is given the natural metric 
d(x,y) - |x-j/|.) 

Now, another way of phrasing 1.03 is to say that the image under / of 
the set { y | </(x,y) < S }, called the open ball J9(x, 6) of radius 6 around x, 
is inside B (/(x),£), similarly defined and named. The reason for the word 
“ball” is obvious from Fig. 1.4. There it is illustrated for maps / : R —► R, 
g : R 2 —► R 2 and h : R 2 —► R 3 with the usual notion of distance. (We shall see 
later that other notions can be important.) “Open” refers to the fact that 
all the points in a ball J9(x,5) are strictly inside it, in the following sense. If 
y £ X(x,6), so that d(x, y) = r < 5, then by the triangle inequality all points 
in 2?(y, are in 2?(x, S) too. Hence y is completely surrounded by points 
in B(x,S) (Fig. 1.3). So B(x,S) has no points in it of what it is natural to 
call its boundary (cf. 1.04 below). By association of ideas this “not including 
a boundary” is thought of as being “unfenced” and hence “open”. (See also 
Exercise 3c; this is motivation from English usage, however, which is always 
warped by making it precise enough for mathematics - a sort of uncertainty 
principle, perhaps. The reader would do well to forget the motivation in 
favour of the defined meaning as soon as he can: a crutch is useful to get you 
walking, but once your leg has healed it stops you running, and you should 
throw it away.) In the course of defining this language precisely, we extend 
it to other sets also: 

1.04. Definition. A boundary point , or point of closure , of a set 5 in a metric 
space X is a point x such that for any 0 < 8 E R, B(x, S) contains both points 
in S and points not in S. 
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Fig. IA 


The boundary dS of the set S is the set of boundary points of S. The 
set S is open if any boundary points it has are not contained in it, closed if 
all boundary points it has are contained in it. 

(Notice that since 0 has no points, no B(x , 5) for any x or 6 contains 
points, so it has no boundary points. Since it contains all the boundary 
points it has, 0 is closed; since it contains no boundary points, it is open. By 
a similar argument, the whole space X is both open and closed. This is one 
point where the crutch of common usage is a hindrance more than a help.) 
The closure S of S is the set S U dS. (cf. Exercise 3f) 

If you have not met these terms before, you should do Exercise 3 before 
going much further, to learn what they mean in practice. The definitions 
alone cannot give you the flavour, and as we are not writing a topology book 
we cannot roll them fully around the tongue in the text. 

Now, we sometimes want to talk about continuity when we do not have 
a natural choice of metric, and to suppose we had would confuse things thor¬ 
oughly. Most notably this occurs on indefinite metric vector spaces (compare 
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Fig. 1*5 


and contrast Exercise 2b and Exercise 2c). It is precisely this false supposi¬ 
tion of a metric in models of spacetime which still sustains a lot of innumerate 
or semi-numerate “Philosophers Of Science” in their belief in a twins “para¬ 
dox” (cf. Chapter 0.5.3). To avoid this confusion it is convenient to have a 
“continuity structure” separate from particular choices of metric. This kind 
of structure is called a topology , and we shall define it in a moment in 1.07. 
Moreover, just as leaving out bases can greatly clarify some parts of linear 
algebra, the separation of continuity from specific metrics proved such a pow¬ 
erful tool that topologies have become as central to modern mathematics and 
physics as vector spaces. Before the definition, we shall prove a lemma which 
says essentially that the roundness of the balls B(x y 6) used is irrelevant to 
the definition of continuity; all that matters is their openness. 

1.05. Lemma. A function f : X —>Y between two metric spaces is contin¬ 
uous at x £ X if and only if for any open set V containing f(x), there is an 
open set U containing x such that f(U) C V. 



Fig. 16 
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Fig. 1*7 


Proof. 

(i) Suppose / is continuous at x. 

Then since V is open and /(ar) £ V, there exists £ £ R such that 
B(f(x),e) C V. Exercise 3b). 

Now, by continuity of / there exists 6 6 R such that d(x, y) < 6 =>• 
d(f(x),f(y)) < e. Hence f(B(x,6j) C B(f(x),e) C V, and B(x,6) is 
open (Exercise 4a), so if we set U = B(x, 6) we are done. 

(ii) Suppose for each open set V containing /(*), there is an open set U 
containing x such that f(U) C V. 

Then in particular, since each B(f(x),e) is open (Exercise 3a), there 
is an open set U such that f(U) C B(x,e), with x £ U. Hence by 
Exercise 3b there exists 6 £ R such that B(x, 6) C U. But then 

f(B(x,6))Cf(U)CB(f(z),e) . 

Since this can be done for any £, / is continuous at x . □ 

1.06. Corollary. A map f : X —>Y between metric spaces is continuous if 
and only if f*~(V) is open for each open set V in Y. 

Proof Suppose / is continuous. 

Then it is continuous at each x £ X, and in particular at each 
x G / 4 ”(V r ). Now, since V is open there exists by the above some open 
set U for each such x, such that f(U) C V. Hence by Exercise 3b applied 
to U, for each x we have some 6 such that B(x,6) CUC f*~(V). Now by 
Exercise 3b applied to / 4 “(V r ), f*~(V) must be open. 

Similarly for the converse. □ 

We are now able to capture the essential aspects of continuity, with no 
irrelevancies, in the following definitions. 

1.07. Definition. A topology on a set A is a specification of which subsets 
of X are to be considered open. More set-theoretically, is is a family T of 
subsets of X f called the open sets of the topology, satisfying the axioms 
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O A) 0 E T and X £T. 

0 B) For any finite family { [/,• | * = 1,..., n } of open sets, pj? =1 £/,• is 
open. 

0 C) For any family (finite or infinite) { U a | a £ A } of open sets, 
Ua eA U <* is °P en - 

The topology is Hausdorff (pronounced “housed orff” and named after 
the German mathematician F. Hausdorff (1868-1942)) if it satisfies one extra 
axiom: 

0 D) For any two distinct points x,i/6l, there exists open sets U,V E T 
such that xeU,yeV, and U fl V = 0 . 

(Since we have agreed that open sets can be of any shape, not just 
round balls, OD can be remembered by Fig. 1.8, which relates an English 
meaning to the German pronounciation.) This is such a useful condition that 
by “topology” we shall always mean “Hausdorff topology” unless otherwise 
stated. 

The set X with the topology T is the topological space (X,T), as usual 
denoted by just X if only one topology has been mentioned for the set. (It is 
not unheard of to give a set as many as ten topologies at a time. We shall 
content ourselves throughout with one or two.) 

If X is a metric (respectively, pseudometric) space, then the metric (re¬ 
spectively, pseudometric) topology on X is the topology consisting of the open 
sets defined in 1.04 (cf. Exercise 4a). A metric topology is always Hausdorff 
(Exercise 4a), a pseudometric topology is not - which severely limits its use¬ 
fulness. In general, many other metrics serve equally well to define a given 
metric topology by giving rise to the same open sets. We shall see this for 
vector spaces in §3. 

The set R of real numbers will always in this book be assumed to have 
the usual metric topology, given by the metric d(x,y) = \x — y\. 

If T is the metric topology corresponding to some metric on X then the 
topological space (X f T) is metrisable. It can be useful arbitrarily to pick a 
metric to give a handle on T in computations, even when there is no natural 
choice, just as an arbitrary basis can be convenient when computing with 
a vector space. In particular, metrisability guarantees that X is Hausdorff, 
which is handy. 
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We can make the following extensions of the definitions in 1.04, showing 
that the underlying concepts are of a topological rather than a metric nature. 

1.08. Definition. A neighbourhood of a point x in X , generally denoted 
by N(x) or some variation of it, is an open set containing x. The role of 
open balls is taken over by neighbourhoods as we go to topology. (That this 
take-over sacrifices nothing as far as continuity is concerned is the essence of 
Lemma 1.05.) 

A boundary point of a set 5 in a topological space X is a point x such 
that for any neighbourhood N(x) of x y N(x) contains both some points in S 
and some points not in S. (Fig. 1.9) 

The boundary dS of 5 is the set of boundary points of 5. 

A set S is closed if it contains all its boundary points, or equivalently 
(Exercise 5a) if X \ S is one of the open sets of the topology. It is important 
to note that a set may be neither open nor closed (cf. Exercise 3g). 

The closure 5 of a set 5 is the set S U dS . (cf. Exercise 5b). 

We now have the framework in which we can define continuity in full 
generality, uncluttered by metrics. 



Fig. 1,9 


1.09. Definition. A map / : ( X,T) —► (Y, Z) between topological spaces is 
continuous if 

vez=>r(v)eT. 

This is just the reformulation we reached in 1.06. (cf. also Exercise 5c) 

1.10. Lemma. If f : ( X y T ) —► (Y y Z) and g : (Y, 27 ) — ► ( Z,II ) are continu¬ 
ous maps, then so is go f : (X,T) —► (Z, II) : x \-+ g(f(x)). 

Proof 

V £ II ^ g*~(V) G Z (g continuous) 

=> f*~{g*~(V)) £ T (/ continuous) 

(g o f)*~(V) G T (same set). 

□ 
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The shortness of this proof illustrates the power of the topological view¬ 
point. Proving the same thing for the more limited case of continuous maps 
between metric spaces is actually harder, with assorted e’s and 6 's (write 
it out and see how messy it looks!) but that result is implied by this one 
and 1.06. 

1.11. Definition. A map / : X —* Y between topological spaces is a 
homeomorphism if it is continuous, bijective and its inverse is also contin¬ 
uous. (cf. Exercise 7) (There is obviously a homeomorphism between the 
shapes of Fig. 1.10, though it cannot preserve distances: we cannot have 
d{f(x) 1 f(y)) = d(x,y) in general. This is another reason why continuity is 
most naturally considered in topological rather than metric terms, and the 
reason for the name “india-rubber geometry” for topology.) If there is a 
homeomorphism between two spaces they are homeomorphic. 



Fig. 1.10 


1.12. Lemma. IfT^U are topologies on X, and the identity map lx : 
X —► X : x I— ► x is a homeomorphism, then T = E. (This saves a lot 
of work when showing that different definitions give the same topology.) 

Proof. V E T Ix(V) E E => V E E (lx continuous) 

U € £ => (Ix)*~(U) € T => U E T ( lx , which is lx again anyway, 
continuous.) So T = E as sets, and hence as topologies. □ 


Exercises VI. 1 

1. Show that if, for some function / : R —► R, at some x E R we have 
continuity of / at x by virtue of the same choice of 6 for each e, then 
/ is constant between x — S and x + 6. 
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2. a) Using the Schwarz inequality (IV.1.07) show that for any inner product 

space (X, G ) and vectors *, y G X, 

G(x + y, * + y) < (||*|| G + ||y || G ) 2 • 

b) Show that for any inner product space (X, G) the function 

d G : X x X -+ R : (*, y) ||z - y|| G 
is a (non-tensorial) metric on X. (For the triangle inequality, apply (a) 

ll(* - y) + (y - *)||g •) 

c) Show that if G is an indefinite metric tensor, then do is not even a 
semimetric (consider, for example, the vectors (0,0), (1,0) and (1,1) 
in H 2 ). 

3. a) Show that each open ball B(x y £) in a metric space is indeed open by 

Definition 1.04. (Hint: Fig. 1.4.) 

b) Show that S C X is open in the metric space X if and only if for each 
point x E S there exists some 0 < S € R such that B(x y 6) C S . 

c) Show that S C X is open if and only if X \ S is closed. (If the 
boundary between two countries is a fortified wall - Hadrian’s Wall, 
say, or the Great Wall of China - then the country A that includes 
the wall is closed to invasion by B y while B is open to attacks from 
A . In topology, A and B cannot have a wall each.) 

d) Show that dB(x y 6) = { y | d(x y y) = 6 } and that this set, called the 
sphere S(x y 6) of radius 6 y centre x, is closed. 

e) Show that the closed ball of radius 6 y centre x and denoted by 

B(x,6) = {y | d(x,y) < 6 } 

is the closure of the open ball JB(x,6). 

f) The closure S of any set S C X is closed, so justifying the term 
“closure”, (cf. Exercise 5b for the general case.) 

g) The set{x|0<x<l}is neither open nor closed as a subset of R 
with the usual metric. 

h) Show that R itself is both open and closed as a subset of R. 

i) Show that R x {0} is closed but not open in the plane R x R with the 
metric 

d{{x,y),{x',y')) = +\/l(* - *') 2 + (y - y') 2 l • 

4. a) Show that a metric topology does indeed satisfy O A-0 D, and a pseu¬ 

dometric topology satisfies O A-0 C. 
b) If x, y are distinct points in a pseudometric space X such that d(x, y) = 
0, then any set open in the pseudometric topology that contains one 
contains the other, so that X is not Hausdorff. 
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c) The intersection of the infinite set { B( 0,1 + £) | n G N } of open balls 
in R is not open. Hence, OB cannot usefully be strengthened. 

5. a) A set 5 in a topological space X contains all its boundary points if 

and only if the set X \ S of points of X not in S is open. 

b) The closure S of any set 5 in a topological space X is closed, (cf. Ex¬ 
ercise 3f for metrisable spaces.) 

c) A map f : X Y between topological spaces is continuous if and 
only if for every closed set CC7, /*“((?) C X is also closed. 

6. a) In a Hausdorff topology, each set {x} containing only one point is 

closed. 

b) The continuous map, / : R —► R : x i-+ x 2 , takes the open set U = 

]—1,1[ to a set f(U) which is neither open nor closed. (Note; for 
reasons of space we shall not go into the proofs that the elementary 
functions - polynomials, log, sin, etc. - are continuous. The work 
is not in proving continuity, but in defining the functions themselves 
sufficiently precisely to prove anything at all. This is done in any 
elementary analysis book.) 

7. If X is the set ]0,1] C R and Y is the unit circle { (a:, y) | x 2 + y 2 = 

1} in R 2 , both with the usual metric topology given by Euclidean 
distance, then the map 

X —► Y : (sin27rx,cos27rx) 

which wraps X once round Y is a continuous bijection but not a 
homeomorphism. 

2. Limits 

The equipment that we have set up is very powerful. Notice that we have 
now two new kinds of object (metric and topological spaces) and allowable 
maps between them, to place alongside vector and affine spaces, with linear 
and affine maps. However the rule for allowing maps - continuity - is a little 
surprising. Instead of preserving, for instance, addition forwards as a linear 
map must do, to be continuous a map must preserve openness backwards 
(/*”(open set) must be open) but not necessarily forwards, (cf. Exercise 1.6b. 
If it does carry open sets to open sets, / itself is called open.) This grew 
naturally out of our considerations of computability, but those were not the 
original motivation. The interest was more in something that is preserved 
forwards: the limit of a sequence. 

2.01. Definition. A mapping S : N —► X : i h S(i), where X is any set, 
is called a sequence of points in X. ( S(i ) is often written a:,-, for reasons of 
tradition and convenience. As usual, N denotes the natural numbers. We 
shall also continue to denote a neighbourhood of a point by N(x).) 
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Fig. 2.1 


A sequence S of points x t * in a topological space X has the point x as a 
limit if every neighbourhood N(x) of x contains x,- for all but finitely many 
i G N. (Hence after passing some x m where m = max{ i | X{ £ N(x) }, which 
is required to be a finite set and hence has a maximum, S stays inside iV(x).) 
If X is Hausdorff, S can have at most one limit (Exercise la) and we speak 
of the limit of 5; it need not have any limit in general (Exercise lb). 

If S has the limit x, then S is convergent and converges to x (cf. Exer¬ 
cise Id). We write for short that 

lim(S) = lim x* = x . 

*■—►00 

If S does not converge, we may still have a convergent subsequence S' of 
S. (Formally S' is given in the form S' = S o J, where J is any order- 
preserving injective map / : N —► N. This just codifies the obvious notion, 
and guarantees that S' also will be an infinite sequence). We may have several 
subsequences of S converging to different points (Exercise lb), but if S itself 
converges, so do all its subsequences, and to the same point. (Exercise lc). 

2.02. Lemma. A function f : X Y between topological spaces, where 
{X y T) is metrisable, is continuous if and only if it preserves limits; formally, 
if and only if for any sequence of points in X 

lim x,- = x => lim (/(x,)) exists and is f(x) . 

*’—►00 *—►00 v ' 

Proof 

(i) If / is continuous, for any neighbourhood N(f(x)) of /(x) there is a 
neighbourhood W'(x) of x such that f(N , (x)) C N(f(x)) (Lemma 1.05, 
rephrased.) So since for any sequence 

f(xi) & N (/(*)) =► /(*,•) g f(N'(x)) =» Xi t N'(x) , 
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we have 

A={i | /(*<) * *(/(*)) }c{i|«,* JV'(x) } = B . 

If limj^oo x,- = x, 2? must be finite by definition and hence so must A. 
Thus / preserves limits. 

(ii) If / is not continuous at some x, then for some neighbourhood N(f(x)) 
of /(x) every neighbourhood iV'(x) of x contains points y such that 
/(y) & Af(/(x)) (Fig* 2.2). Choosing any metric d on X such that the 
corresponding topology is T, we take a sequence 

N i (x) = B(x,\) 

I 

of open balls in this metric, which are all neighbourhoods of x. Hence in 
each Ni we can choose y,- such that f(yi) & iV(/(x)). Now clearly the y,- 
converge to x (details, Exercise 2), but the sequence /(y,) stays outside 
AT(/(x)) and cannot therefore converge to /(x): so / does not preserve 
limits. 

Hence if / does preserve limits, it must be continuous. □ 

Notice that continuity always implies preserving limits: only the converse 
depended on metrisability of X. For full generality we could have replaced 
preserving limits by preserving the operation of closure. However, in the 
sequel we shall be dealing only with metrisable topologies. (We just don’t 
want to confuse ourselves by a choice of metric; in spacetime that would turn 
out to depend on a choice of basis - whereas the topology which we shall 
use does not.) It will be safe throughout this book therefore to think of a 
topology as a minimum structure allowing us to take limits, and continuity 
as the preservation of this structure. 

In analysis, this view of the nature of topologies and continuity is the 
most central; several kinds of convergence are juggled in the average infinite¬ 
dimensional proof. 
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Exercises VI.2 

1. a) Show that if x, y are both limits of a sequence x,- of points in a 

topological space X , any neighbourhoods N(x ), N(y) of x, y have x t - 
in common for all but finitely many t. Deduce that if X is HausdorfF, 
then x = y. 

b) The sequence S : N — ► R : i i-+ (— 1)* has no limit, in the sense 
of Definition 2.01. Find convergent subsequences of S converging to 
different points. 

c) Prove, for any sequence 5 in a topological space X , that if lim S(i) = 

' i—*oo 

x, all subsequences of S converge to x. 

d) You may be meeting topologies explicitly for the first time here, but 
you will have been taught a definition of convergence for a sequence 
(recall that a sequence, unlike a series, involves no adding up). Either 

(i) Show that this is equivalent to Definition 2.01 in the case of R 
with the usual metric topology or 

(ii) show that it is not by producing a sequence that converges by one 
definition and not by the other. 

In case (ii) destroy (or sell to an enemy) the text using the other 
definition: it is still mentally in the confusion about continuity that 
was only cleared up around the end of last century. Any definition not 
equivalent to 2.01 is known by bitter experience to bring chaos in its 
train. 

2. Show that in a metric space X ; 

a) Every open ball 5(x, 6) must contain an open ball of the form B(x, £) 
for some n 6 N. 

b) Every neighbourhood JV(x) of a point x must contain at least one of 
the open balls B(x,^), n £ N, and hence all but finitely many of 
them. 

c) Deduce that the sequence y,* in the proof of Lemma 2.02 converges to 
x. 


3. The Usual Topology 

There is only one useful topology on any finite-dimensional affine or vector 
space, but a great many ways to define it. From a coordinate and limit point 
of view, it is described very simply. A sequence of vectors in R n converges 
if and only if each sequence of j -th coordinates does. The j-th coordinate 
of the limit is then the limit of the j-th coordinates, as one would hope and 
expect. However, it is not transparently obvious that this means the same in 
all coordinate systems. The easiest proof that it does is to give a coordinate- 
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free definition and show that it reduces to this form in any coordinates. We 
may approach such a definition as follows. 

Another viewpoint on the nature of a topology on a set A (and a very 
powerful one when formalised) is as a rule for which functions on X are to be 
considered continuous. For example, if all functions on X are continuous, all 
subsets of X must be open; this is called the discrete topology, and is useful 
surprisingly often. Now, for A a vector space the mildest requirement that 
we can reasonably make (in the absence of any extra structure on A), and 
still expect to relate the topology to the linear structure, is that at least all 
linear functionals X —► R should be continuous. In finite dimensions it turns 
out that this is enough to define the usual topology; in infinite dimensions it 
defines a topology, but no one topology is “usual”. 

3.01. Definition. The weak topology on a vector space V is the smallest 
(Exercise la) family T of subsets of V such that 

W i) T is a topology 

Wii) For any linear functional / : V —► R and open set U C R, 

r(U)£T . 



Open sets in R are exactly unions of sets of open intervals (Exercise lb ,c). 
Hence the sets of the form f*~(U) in V are the unions of sets of infinite slabs 
(Fig. 3.1a,b) which do not include their boundary hyperplanes. (These latter 
are lines and planes for V = R 2 ,R 3 respectively.) Lacking infinite space, we 
show only the heart of each slab. 

But to satisfy W i) we must also include finite intersections of such slabs 
or sets of slabs (Fig 3.1c,d,e), which gives us chunks of all flat-sided shapes 
and sizes, and infinite unions of such chunks, by which we can build up 
rounded figures (Exercise 2). 

Now the condition that all linear functionals be continuous is a large 
one at first sight; to check the truth of it for a topology, we can reduce the 
work considerably via the following: 

3.02. Lemma. For any topological space X, the sum ]^? =1 /« of a finite set 
of continuous functions /i,..., f n : X —► R is again continuous. 

Proof Represent the usual topology on R by the usual metric. (1.07). For 
any x € X, and any positive S\ ,... ,£ n € R, there exist by hypothesis open 
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sets U \,..., U n C X containing x such that 

fi(Ui) C £i ) = ]/,(*) - £i,fi(x) + £i [, i = 1,..., n, 

since B(fi(x), £ ) is an open set. 

So for any 0 < € G R we can set each a = find corresponding Ui and 
define 

£/=!>• 

«=i 

This is again an open set by OB, containing x because each Ui does. More¬ 
over, 

y G U => y £ Ui for each i = 1,..., n 

=> fi(y) G B(fi(x ),£<) for each i 

=>■ | fi(y) - fi(x )| < — for each i 

n 

=>2l fi(y)-M x )\< e 

1=1 

n n 

— 5^/i( a? )l < £ (Exercise 3) 

»=i *=i 

***((&*) x ’ £ )• 

Thus (X3r=i fi)(U) Q ^ required for continuity. □ 

3.03. Corollary. For any basis fei,...,6 n of a finite-dimensional vector 
space V, with some topology T f we have all f E V* continuous if and only if 
the vectors of the dual basis are continuous. 

Proof 

(i) If all covariant vectors are continuous, that includes ft 1 ,..., b n . 

(ii) Any / € V* is a linear combination of ft 1 ,..., h n \ that is, exactly a sum 

of scalar multiples of them. Since any scalar multiple of a continuous 
function is continuous (Exercise 4), if the 6 f> s are continuous then so 
is /. □ 

This means that all the open sets of the weak topology are also open in 
the open box topology for any choice of coordinates: 
we could replace W ii) by 

Wii*) For every open set U CR, each (b*)^(^) € T. 

That would replace Fig. 3.1c,d,e by pictures like Fig 3.2a,b without changing 
the topology. 
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Fig. 3.2 


We have shown W ii) and W ii*) to be equivalent. Equivalence follows for 
the definitions “The smallest family of subsets of X such that W i) and W ii) 
hold” and “The smallest family such that W i) and W ii*) hold”. Therefore 
the two topologies must be the same, and we can build up precisely the same 
collection of open sets by taking infinite unions of open boxes as by taking 
unions of the more arbitrary chunks of Fig. 3.1c,d,e. 

The same topology, then, goes under two names, depending on the choice 
of definition, and we could find others (cf. Exercise 5). Since they all refer 
to the same thing, we shall agree to call it the usual topology on a finite¬ 
dimensional vector space. We may use other names to specify that we are 
about to invoke a particular definition useful for the computation or argument 
we are about to develop. Since we have proved the open-box topology for 
any choice of coordinates to be the same as the weak topology, it is worth 
pointing out explicitly that we have proved: 

3.04. Theorem. The open box topology is invariant with respect to change 
of basis. □ 

Similarly we have 

3.05. Theorem. If X is an affine space with vector space T, then the topol¬ 
ogy defined on X by choosing a chart C a (cf. II.1.08) on X and setting 

SCX is open in X <=> C a (S) is open in R n with the usual topology , 

does not depend on the choice of chart. (We call this also the usual topology 
on an affine space.) □ 


3.06. Theorem. IfV>W are finite-dimensional vector spaces, then all linear 
maps A:V are continuous in the usual topology. (We have only defined 
this to be true for W = R.) 
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Proof. Choose a basis b \ } ..., 6 n for W, and consider an arbitrary open box 


u = {{v 1 ,...,v n )ev 

= iVrw» 


V 1 €]«i, 6 i[, V 2 e]a 2 , 6 2 [, 



where the /,• = ]a,-, 6 ,[ are open intervals in R. Now the n maps 


6 *o A: V-+R 

are linear (being composites of linear maps) and have range R. So by W ii) 
the sets ( 6 * o A) *-(!,) are open in V. Therefore so also is their intersection. 
However, 

n 

v e P|( 6 * o A)*~(Ii) <=> 6 * o A(v) G Ii , each i 

i =1 

<=> A(v) £ ( 6 % )*-(Ii) , each i 

*=* A(v)eU . 

So A*~(U) = ^= 1 ( 6 * 0 A) 4 ”(/,•), which we have just shown to be open. 

Hence for U an open box, A satisfies 1.08, and since by Exercise 2 
an arbitrary open set U' is a union \J a U a of open boxes A satisfies 1.08 
completely. Finally, A < “(f/ / ) is a union \J a (A*~(U Q )) of open sets and hence 
again it is open. Thus A is continuous. □ 

3.07. CoroUary. IfX , Y are finite-dimensional affine spaces , a// affine maps 
X -+Y are continuous in the usual topology. □ 

The open box topology, viewed as a topology on R n = R x R x • x R, is 
a special case of the following useful tool. We have seen how often products 
of sets are convenient; here we add some extra structure: 


3.08. Definition. Given topological spaces Xi ,..., X n , the product topology 
on the set X\ x X 2 x • • • x X n is defined to be the collection of all unions of 
sets of the form 

f/i x • • • x U n C X\ x • • • x X n 


where each Ui is open in X{. (cf. Exercise 6 a) 

The open box topology on R n illustrates this so well that further pictures 
should not be necessary. With this device, we can prove very easily the 
following geometrically useful result. 

3.09. Lemma. IfF : V x V —► R is any bilinear form on a finite-dimensional 
vector space V , then 

f : V —► R : v »-► F(v,v) 
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(0,!) (1,1) 



is continuous . (This function is called the quadratic form corresponding to F 
because for all A E R, v € V we have /( \v) = A 2 f(v).) 

Proof f is the composite of the diagonal map 

Diag : V —► V x V : v i-* (v, v) 

(Fig. 3.3, with [0,1] instead of 7, explains the name) and F, which in turn 
is the composite (cf. V.1.03) of the tensor product 

<g):V x7 ->V®V 
and a linear map F : V ® V —► R. 

Now F is continuous by the definition of the topology on V ® V, 0 is 
continuous as a special case of Exercise 6b. Also Diag is continuous because 
(Diag)*"(C/i x £^ 2 ), where Ui.U^ are open sets in V, is exactly U\ (MJ 2 which 
is again open (continuity follows as in Theorem 3.06). Hence by Lemma 1.09 
/ is continuous. □ 


Exercises VI. 3 

1 . a) li {Tk \ k e K } is & (non-empty, perhaps infinite) family of topologies 
satisfying 3.01 Wii), show that f| keK^k satisfies Wi), Wii). Deduce 
that if any T satisfies Wi), Wii) then there is a smallest (contained 
in all others) such T, and hence that the weak topology exists. 

b) The open intervals ]a, b[ = { x | a < x < 6 } C R are indeed open in 
the sense of Definition 1.04. 

c) Any open set U in R is the union of a set of open intervals. (Hint: 
use the open intervals that, by the metric definition of open, surround 
each x E U.) 
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Fig. 3.4 


2. a) Express the open ball 5(0,1) = { (x, y) | x 2 + y 2 < 1 } in R 2 with the 

standard metric as a union of rectangular slabs. (Hint, Fig. 3 . 4 ) 

b) Show that if 77 is any collection of subsets of a set X, the set 77 of 
arbitrary unions of finite intersections of sets in 77 , together with 0 
and X , satisfies OA—OC and is thus a topology for X, (77 is then 
called a sub-basis for the topology ft, which is generated by 77 .) 

c) Show that the topology generated by 77 is the smallest topology in 
which all the sets of 77 are open, in the sense that if T is another 
topology such that 77 C T we have 77 C T. 

d) Show that the product topology (Definition 3.08) generated by the 
products of open sets {/,*, is the smallest topology which makes the n 
projections 

7 r* • X\ x X2 x ••• x X n —► X% : (a?i,£2,• • •, x n ) 1—► x,- 

continuous. 

3. Show by induction from the triangle inequality, \a + 6 | < \a\ + 16 | in the 
case of real numbers, that for any finite set ai,...,a n , bi,...,b n G R 
we have 

Yl a i-J2 bi 

»=1 1=1 *=1 

4. If X is a topological space, / : X —► R is continuous, and A G R, show 
that for any xEAf, 0 <££R there is an open neighbourhood N(x) 
of x such that 

!(m) C B(/(«),||j). 

Deduce that the function, A/ : X —► R : x 1 —► A (/(#)), is continuous. 

5. a) Show that for any choice of basis on a vector space X the function 

d 8 : XxX-+R 

((x 1 (y 1 ,..., y")) H+ maxdx 1 - y 1 1 ,..., |x” - y n |} 

is a (nontensorial) metric, that the corresponding open balls are open 
boxes in this basis, and that the corresponding topology is the usual 
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topology. ( d s is called the square metric because of the shape of its 
open balls in R 2 with the standard basis.) 

b) Show by using the square metric that a sequence *,• = (x 1 (i),..., 
x n (i)) in a vector space X converges in the usual topology if and only 
if each coordinate does, and that 

lim xi = (lim x 1 (i),..., lim x n (i)) . 

i—►oo i—*>00 *-+oo 

c) Show that the diamond metric dd on a vector space X 

d d : XxX^R 

((*S • • • ,x n ),(y\... ,y n )) *-*■ I* 1 - y 1 ! + • • • + \x n - y n \ 

is indeed a metric, draw B ((0,0), l) C R 2 with this metric, and use 
Lemma 1.10 to prove that d 8 and d d give the same topology. (Notice 
that this “mutual inclusion” argument is much less work than express¬ 
ing open sets in one directly as unions of explicitly defined open sets 
in the other, in the manner of Exercise 2a). 

d) Show that the Euclidean metric 

d e : R n x R n —► R 

((* 1 ,...,* n ),(y 1 ,...,y")) *-*• + s/O* 1 - S / 1 ) 2 + ••• + (*"- y") 2 

is a metric, draw J3((0,0), l) C R 2 with this metric, and show that it 
gives the usual topology. 

(The diamond and square metrics are much the most useful metri- 
sations of the usual topology, since they do not involve square roots. 
And in spacetime unlike space, even the Euclidean metric is not in¬ 
dependent of choice of orthonormal basis; it varies with the choice of 
“timelike” basis vector. So being neither invariant nor easy to do sums 
with, it is of little use.) 

6. a) The product topology defined in 3.08 is Hausdorff if all the X{ are 

Hausdorff spaces. Is the converse true? 
b) Prove that 0 : X\ x Xi x • • • x X n —► X\ ® X%® • • -<8> X n is continuous, 
using the product topology on its domain and the usual vector space 
topology on its image. 

7. a) If a metric space (X> d) has the metric topology and X x X the cor¬ 

responding product topology, show that in the usual topology on R, 
d : X x X —► R is continuous, 
b) Deduce by 2.02 that for i *-► x,*, i y g * sequences in X 

lim d(x ni y n ) = d ( lim x n , lim y n ) 

n—+oo \n—*-oo n— mx> / 

whenever the limits on the right exist. 
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8. a) Show that if || || is a norm (IV.1.06) on X , d(x,y) = \\x - y\\ defines 
a metric. 

b) Show that each of the metrics of Exercise 5 is given in this way by a 
norm, \\x\\ = d(0,x). 

c) Show from Axioms IV.1.06 that if || ||, || ||' are norms on finite¬ 
dimensional X , there exist A, A ; > 0 such that for any x £ X, 

AIMI < Hr < VIHI. 

(Pick a basis and show that ||a*6j|| < |a*| • ||6»||.) Deduce that the 
metric on X given by any norm defines the usual topology. 


4. Compactness and Completeness 

We have already (IV.4.01) had to make use of an essentially topological ar¬ 
gument, in proving that we could diagonalise symmetric operators. (In other 
books you may find proofs which look purely algebraic, involving the com¬ 
plex numbers. In fact they require the so-called Fundamental Theorem of 
Algebra, which is actually a topological result, depending crucially on the 
completeness of the complex plane.) To prove the existence of maximal vec¬ 
tors, around which the proof IV.4.05 revolved, we must first look a little 
more closely at the topological properties of the real numbers, which we then 
extend to real vector spaces. 

The first notion we need is that R is complete . There are many different 
ways of defining and proving this property. To prove it one must by one or 
another method construct the real numbers from the rationals, or even the 
integers, which would be out of place here. We shall therefore take it as an 
axiom, in a form in which it is clear that its failure would do such violence 
to our intuition of continuity (which it is one of the purposes of the real 
number system to express) that the real numbers would have been forgotten 
long ago. If this leaves you still wanting a proof, consult an analysis book 
that constructs the real numbers by Cauchy sequences, Dedekind cuts, or 
whatever. 

4.01. Completeness Axiom. The Intermediate Value Theorem is true of 
the real numbers. 

The Intermediate Value Theorem (not Axiom, because most books prefer 
to start from a less comprehensive equivalent statement and prove it from 
that) says that if a function 

/ : [0,1] —► R 

is continuous, and for some v £ R we have 
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Fig. 4.1 


/(i) < * < /(o) 

or 

/(o) <«< m 

then we must have at least one x € [0,1] such that f(x) = v (the “inter¬ 
mediate value” in question). That is, the graph of / must cross the level v 
somewhere. (There are, incidentally, a few mathematicians who refuse to be¬ 
lieve this, or rather say it has no meaning in their terms. But then, a really 
pure mathematician can disbelieve anything.) 

We can tidy the statement a little. If we have an / which does not take 
the value v, then we can get a new function 


/:(0,1]-^R:ih 


fjx) - V 
I fix) - t>| 


which is still continuous (Exercise Id), and takes only the values +1 and —1, 
with no intermediate values at all. Thus the Intermediate Value Theorem 
above is equivalent to the following statement, which is the form it is most 
convenient to use: 

There exists no continuous map / : [0,1] —► R taking only the values —1, 
+1, with /(0) = 1, /(l) = -1. 

Notice that this is not true if we replace [0,1] by the set Q of rational 
numbers x such that 0 < x < 1, for if we define 


/: Q 


R:n-> 



if* 2 >| 


the “point of discontinuity” is missing from Q because it is irrational, so 
on its domain of definition f is continuous. Hence the name “completeness”; 
the assertion is that the real unit interval, and hence the real line, does not 
have “missing points” of this kind. (“Complete” is defined more generally in 
the Appendix.) 
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We must now prove one of the equivalent statements, as a necessary tool 
to reach our main goal of exploring the complementary notion of compactness. 

4.02. Lemma. If we have a sequence Ji,J 2 ,J 2 , - of closed (cf. Exercise 2) 

subintervals Ji = [ji,ki] C [0,1] of the unit interval, such that Ji+1 c Ji for 
each i (Fig. 4.2), then 

D J ^ 0 - 

*€N 

That is to say, there is at least one point x G [0,1] which is in every J,*. 
Proof Suppose not. 

Then for each point x G [0,1] there is at least one i (and hence all 
greater i) such that x £ J,*. Therefore either x < ji or x > x is either to 
the left or to the right of the whole interval jT,*. Define then 


/ : [0,1] -► R : x k* 



if x < ji, for some i 
if x > ki, for some i. 


Now / is well defined. (Since if x was to the left of some interval J,- and to 
the right of another interval J[ we could not have either J, C J[ or J[ C J # *, 
contradicting the given fact that always J m C J n when m > n, and one of 
i, i' must be greater.) It is also continuous at each x G [0,1], since if x < ji, 
say, so that /(x) = — 1, we have 

y € B(x, ±(ji - x)) => y < ji 


so that f(B(x, ^(ji — x))) = {—1} C B(—\,e) for any positive e; similarly 
for x > kj. Now if 0 is not to the left of any «/,*, it is in each J, , and hence in 
f1i€N which is not therefore empty. So if it is empty, /(0) = — 1. Similarly 
/(1) = +1. Thus if the supposition that no point is in every Ji is true, we 
have a function which contradicts the Intermediate Value Theorem. □ 

We now come to one of the characteristic properties of compact spaces - 
sometimes taken as a definition of compactness. We shall defer our more 
limited definition a little longer. 


— [ [ [[ [[[Ml]]] ] ] ] ] - 

0 ii iz h U'k ~ k 5 k 4 k 3 k 2 k, I 

Fig. 4.2 

4.03. Theorem. If S : N —► [0,1] : i »-► x,* is any sequence of points in the 
unit interval, then S has at least one convergent subsequence . 

Proof If Xi G [0, ^] for only a finite set of values of i, we must have x,- G [^, 1] 
for an infinite set, and vice versa, since N is infinite. So we can choose a half- 
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Fig. 4.3 


interval J\ from [0, and [^, 1] in which S takes values infinitely many times 
(if it does so in both, let us agree to take the left one). Next, by the same 
argument we can choose a closed half Ji of J\ in which S takes values x t - 
for infinitely many i, and so on. (Notice that we do not say “takes infinitely 
many values”; if x, = \ for all i, S converges by taking just one value - 
infinitely many times.) Thus we get a sequence 

[ 0 , 1 ] 2 Ji 2 h 2 h 2 ... 

of closed intervals, as in Fig. 4.3, which must then by 4.02 have a point x in 
common. Moreover, in each J t * we know that S takes values infinitely many 
times, so we can choose a subsequence S f of S by 

x'j = first Xi after to be inside Jj 

and still have an infinite sequence. But now each x'- E Jj C J* if k < j, so 
S f takes values at most j — l times outside any Jj . 

Now every open ball B(x,e) around x must contain at least one of the J t , 
because 

y € Ji => \x — y\ < length(J<) , since x 6 J% 

=> i* - »i < .ji 

=> \x — j/1 < e , if we choose i large enough. 

Hence fl(x,£) contains x\ for all but finitely many i. Since every neighbour¬ 
hood of x contains some B(x,£), this extends to all neighbourhoods N(x) of 
x and we have 
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lim x'i = x 

*—►00 

so that S' is a convergent subsequence of S. □ 

Notice that the choice of convergent subsequence was not necessarily 
unique. Indeed, from any sequence of all the rational numbers from 0 to 1 
(such as is used to prove that they are countable) we can choose a subsequence 
converging to any chosen one of the uncountable set of real numbers in [0,1]. 
(If you don’t know about uncountable infinities, ignore this remark.) 

4.04. Corollary. The same is true for a sequence in any closed interval [a, 6]. 

Proof Consider <f> : [0,1] —► [a, 6] : x h-> (b — a)x + a, and its inverse : 

These are affine maps, hence continuous by 3.07. 

If 5 is a sequence in [a, 6], <j>*~ o S is a sequence in [0,1], which has a 
convergent subsequence S' by the theorem, with limit x , say. Then <t> o S' is 
a subsequence of S, and by 2.02 we have 

lim (<l> o S') = <t>(x) . □ 

i-+oo 

This property, of any sequence having a convergent subsequence is one of 
the several equivalent definitions of compactness. We shall not need compact¬ 
ness in full generality, however; we want it for rather more limited purposes 
than the usual mathematics text. Therefore since there is a nice geometrical 
characterisation of compact sets in finite-dimensional vector or affine spaces 
we shall consider it only for such embedded sets, not abstractly. 

Notice that two characteristics are necessary for the unit interval [0,1] to 
have the convergent subsequence property: it is closed topologically, and it 
is bounded . That is (giving the definition a number, since it is so important): 

4.05. Definition. A set S C R is bounded if we can find a bound b £ R such 
that 

xeS =>\x\<b . 

If a set S C R is not closed, there is some boundary point x of S not 
in S: a sequence of points in S (and hence all its subsequences) can converge 
to x and thus not converge to any point in S. If 5 is not bounded, we can 
choose a sequence *,• in S such that |x n | > n for each n £ N, so that the 
sequence and all its subsequences “go to infinity” and cannot converge to any 
real number, let alone one in S. 

In fact topologically (and even differentially) there is very little to choose 
between not being closed and not being bounded; Fig. 4.4 shows the graph 
of 
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Fig. 4.4 


That is a nice (analytic) homeomorphism from the open unit interval (which 
is bounded but not closed) to the whole real line (which is closed but not 
bounded). 

If however we take a set C in R which is closed and bounded, the image of 
any continuous function from it to R will again be so (Exercise 3), even if we 
do not insist that the function be continuous (or even defined) outside C. This 
is a nice characteristic of the set and intrinsic to the set’s topology: unlike 
the open interval it cannot spread out continuously over infinite length. It 
is thus a sort of intrinsic “smallness” or “finiteness” property, for which the 
universal name has become compact 

For sets in a general finite-dimensional vector space, a very similar idea 
holds. Once again, we define it invariantly, and then reduce it to coordinates. 

4.06. Definition. A set C in a finite-dimensional vector space X is compact 
if 

(i) It is closed in the usual topology. 

(ii) For any linear functional / 6 X*> f(C) C R is bounded. (This 
obviously reduces to 4.05 if X = R). 
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4.07. Lemma. Choose any basis 61 ,..., b n forX. Then b l (C ),..., b n (C) C 
R are bounded if and only if all f(C) are, for f E X*. 

Proof Exactly in the style of that of 3.03. (Exercise 4a). □ 

4.08. Corollary. A set S C X is bounded if and only if the values of the 
coordinates of points in S are all of modulus less than some b 6 R. That is, 
S is completely inside some box of side 26 (illustrated in Fig. 4.5 for R 3 ). We 
then say S is bounded by 6 with respect to these coordinates. 



We are now ready for our main theorem about compactness, except for 
a technical point which it is simpler not to dodge: in the continuous function 
with no intermediate value on the rational unit interval we constructed above, 
we took it for granted that we knew what “continuous on” a subset S of R 
meant, even if the function was not defined on the rest of R. In metric terms, 
this is so obvious as to be hardly worth mentioning - if we have a subset S 
of a metric space ( X , d) we get an induced metric on S by restricting d to 
S x S C X x X. Then (cf. Chap. 0.§2) pairs of points in 5 retain the 
distances they had as points of X and we carry on as before. This metric 
is used explicitly in Exercise 3, for example. However, it is not always very 
appropriate or convenient: if we always took the induced metric for surfaces 
in R 3 for instance, we would say that the distance from London to Sidney was 
7,900 miles, for example, whereas the more useful distance is the one within 
the surface of 11,760. Thus for a subset we may want a different metric, but 
we usually want the induced notion of continuity. Hence we define 

4.09. Definition. If S is a subset of a topological space X , the induced 
topology on S is the collection of sets { S (T U | U open in X }. 

These sets are called open in S'; this does not mean that they are 
necessarily open in X, if S is not. For example, if X = R 2 and S = 


Oix*. 7^a£/Le#fui£Zciz 



4. Compactness and Completeness 


143 



Fig. 4.6 


{ (x,y) | a: 2 = y } (Fig. 4.6) the intersection of 5 with the open disc 
u = { (*>») | x 2 + j/ 2 < 2} is neither open in X (since it does not con¬ 
tain a neighbourhood in the space X of, for example, (0,0)) nor closed in X, 
since it does not contain (1,1) or (—1,1) which are boundary points of it. 
But by definition it is open in S. (cf. Exercise 5.) 

With this topology a subset S is called a subspace of X. When we 
need to distinguish other than by context we call it a topological subspace 
as distinct from the vector or affine subspaces we have had before. (Notice 
that the example of S in Fig. 4.6 is neither a vector nor an affine subspace 
of R 2 .) Sadly for the science-fiction fan, none of these kinds of subspaces can 
be dodged into for faster-than-light travel - they are all just subsets of what 
we started with, with restrictions of the structure we started with. 

With that technicality out of the way (if you are still uneasy about it, 
do Exercise 5) we can state and prove one of the main properties of compact 
sets, for which we have already found a use: 

4.10. Theorem. If C is a compact set, in any finite-dimensional vector or 
affine space 1 X, and f : C —► R is a continuous function with respect to the 
induced topology on C, then f(C) is bounded and closed . 

Proof We shall assume X a vector space (the proof transfers at once to affine 
spaces). 

Choose a basis, and let C be bounded with respect to these coordinates 

by 6. 

First we prove that C has the convergent subsequence property. 

Let S be any sequence of points c,* in C, and write c,- in coordinates as 
(c 1 (i),..., c n (i)) - bringing the Vs into brackets to emphasise that they have 

1 If we had given a more abstract definition of “compact”, this condition of being in 
X would be unnecessary. We are just saving effort and space. 
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nothing to do with variance. Then we can choose a subsequence c x (j) of the 
sequence c 1 (i) of first coordinates converging to (say) x 1 by 4.04, since c 1 (i) 
is a sequence in the closed interval [—6,6]. We then define S 1 as the subse¬ 
quence of S that has c l {j) as its sequence of first coordinates. Next, consider 
the sequence c 2 (j) of second coordinates of values of S 1 , choose a convergent 
subsequence and define 5 2 as the subsequence of 5 having the chosen subse¬ 
quence as second coordinates - converging to x 2 say. The first coordinates 
still converge to a? 1 , because they are a subsequence of a convergent sequence 
(cf. Exercise 2.1c). 

Repeating this process n times, we get a subsequence 

s = (?'(.'),» 2 (0 . c *( 0 ) 

of S, with each sequence of j-th coordinates converging to some x*. Hence, 
by Exercise 3.5b, S converges to (x 1 ,... ,x n ), which must therefore be in C 
since C contains its boundary points - that is, all the points in X to which 
sequences in C can converge are in C. By Exercise 5f, S converges in the 
topology on C. 

(Notice that our argument depended on finite-dimensionality: the limit¬ 
ing result of choosing a subsequence an infinite number of times might leave 
us with no points). 

Now, if /(C) is not bounded we can choose a sequence x,* in /(C) that 
“goes to infinity” with all its subsequences. But for each x,-, since it is in 
/(C) we can choose a c* in C such that /(c,) = x,*. This gives us a sequence 
in C, which must have a subsequence converging to a point c, say, in C, so 
that (restricting to the values of i in the subsequence) 

lim Xi = lim /(c t ) = /(lim c,) = /(c) , by 2.02. 

*—►00 •—►00 I—►oo 

Thus we have found a convergent subsequence of x,*, contrary to assumption. 
Therefore f(C) must be bounded. 

Similarly, if x is a boundary point of /(C), choose a sequence x,* in f(C) 
converging to x, a sequence c,- in C such that /(c,) = x<, and a convergent 
subsequence cj- of c,*, with limit c G C. Then of x(* = /(cj), we know xj- still 
converges to x by Exercise 2.1c, and 

x = lim x\ — lim /(c|) = /(lim cj) = f(c) 

*—►00 *—*-00 i—►oo 

so x G /(C). Thus /(C) is closed. □ 

4.11. Corollary. If f : C —► R is a continuous function on a compact set, 
then f has a maximum on C; that is, not only do the values of f stay below 
a certain level, but there is some c G C such that for all c ' G C, 

m < m. 
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2 

(Fig. 4.7 shows that the function x h* jffcp defined on all of R does not have 
this property.) 

Proof. We have shown that f(C) is bounded, say by k. The existence within 
[—Jb, it] of a “top end” of the set f(C) is another use of completeness: the set 
of rationals q in [0,1] with q 2 < \ has no meaningful “top end” within the 
rationals. Precisely, we want some x E [—k, k] such that 

(i) x > y for all y E f(C). 

(ii) We can find no finite length e by which x is “above” f(C). That is, 
every B(x,e) contains at least one point of f(C). That is exactly 
the requirement that x lies in the closure of /(C). 

The proof of existence of x (Exercise 6) is precisely similar to the proof 
of 4.02. Condition (ii) means that x is either in /(C) or a boundary point of 
/(C). Now, since /(C) is closed, we have 

x = /(c) 

for some c G C, so we have found a c such that 

</€C =►/(</)</(c) 

as required. E 

4.12. Corollary. Let A be any operator on a finite-dimensional inner-product 
space X. The function x Ax • Ax on the unit sphere S in X has a max¬ 
imum value, \\A\\, which is attained by at least one vector x E S. That is, 
there exist maximal vectors for A. (cf. Chapter IV.4.01) 

Proof S is evidently bound by 1 in orthonormal coordinates. By Exer¬ 
cise 1.6a the set {1} C R is closed, hence since the function || || : * yjx • * 
is continuous by 3.09, the set S' = (|| ||)"“({ }) is closed by Exercise 1.5c. 

S is therefore compact, and since the quadratic form x i-> Ax • Ax is 
also continuous by 3.09, the result follows. □ 

Remark. If you are still unconvinced that topological reasoning is necessary 
to prove that we can diagonalise symmetric operators, do Exercise 7. 

Compactness is an extremely powerful tool, and one that a mathemati¬ 
cian learns to use as readily as his fingers: “by compactness” is often an 
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acceptable substitute for an argument in full, since everyone is so familiar 
with how compactness proofs go, and what can and cannot be done with 
them. It is so powerful, in fact, that mathematicians feel very naked facing 
the world without it, whereas physicists have to; for example the contours 
in Fig. IV.2.3b,c, and the Lorentz group, are non-compact. Moreover, no 
remotely reasonable spacetime can be compact. This is in sharp contrast 
to “pure” differential geometry, which is largely conducted on nice compact 
manifolds, with nice compact groups in the background. In consequence, we 
shall not explore compactness further, since we shall have fewer opportuni¬ 
ties to use it than “pure mathematics” texts in the same area. It remains 
however one of the central notions of topology, and the reader should seize 
every opportunity to get better acquainted with it. It is less useful, because 
less often applicable, in physics than in mathematics, but still essential as 
we have just seen. Theorem IV.4.05, for instance, is a tool we could not do 
without in what follows. 


Exercises VI.4 


1. a) If the function g : [0,1] —► R is continuous, with g(x) ^ 0 for any 
x E [0,1], show that ^ : [0,1] —► R : x »—► is also continuous. 

b) If g : [0,1] —► R is continuous, show that |</| : X —► R : x |y(x)| is 
also continuous. 

c) If g,g' : [0,1] —► R are continuous, show that gg ' : [0,1] —► R : 
x i-» g(x)g , (x) is continuous. 

d) If f(x) ^ v for any x E [0,1], where / : [0,1] —► R is continuous, show 
that if 


/(*) = 


/(*) - V 
\f(x) - v| 


(/(*) -«) 


\f(x) - v| 


then / is continuous and f(x) = —1 or +1 according as f(x) < v or 

/(*) > V. 

2. The intersection of the infinite family 


i/„ = {*|0<a;<^}, n€N 

of open intervals is empty, so that the “closed” condition in 4.02 is 
essential. 

3. a) Show that any closed bounded set C in R is contained in some closed 
interval [a, 6], and deduce from 4.04 and the closedness of C that any 
sequence taking values in C has a convergent subsequence with its 
limit in C . 
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b) If / is a continuous map C —► R (where continuity is defined on C by 
Definition 1.05, using the same metric d(x, y) = |x — y\ on C as on R), 
and 5 is a sequence taking values in f(C) , show by using a) and 2.02 
that S must have a convergent subsequence with its limit in f(C). 

c) Show that if f(C) were not closed, or not bounded, there would ex¬ 
ist sequences taking values in f(C) but not having any subsequence 
converging to any point in f(C). 

d) Deduce that f(C) is closed and bounded. (Notice that this is a proof 
of a special case of Theorem 4.10, by essentially the same method.) 

4. a) Write out the proof of Lemma 4.07. 

b) Define “bounded” and “compact” for sets in a finite-dimensional affine 
space, and show in the manner of 4.07, 4.08 how these definitions may 
be expressed in coordinates. 

5. a) If a topology on X is given by a metric, prove that the induced topol¬ 

ogy on S C X is given by the induced metric (so that we really are 
just transferring to topology the obvious notion in the metric case). 

b) Prove that if T is open in S in the induced metric from X , and S is 
open in the metric sense on X , then T is open in X. 

c) Prove that if T is closed in S and S is closed in X, then T is closed 
in X, again using metrics. 

d) Repeat (b) and (c) using topologies and induced topologies instead of 
metrics. 

e) Prove that if / : X —► Y is a continuous function, X, Y topological 
spaces, and S a subspace of X, then f\s is continuous. 

f) Prove that a sequence in a subspace S of X converges to x E S in the 
induced topology on S if and only if it converges to x as a sequence 
in X. 

6. a) If 5 is a subset of the closed interval [a, 6] and there is no x 6 [a, 6] 

such that 

(i) y € S => y < x 

(ii) x G S (cf. 1.04, 1.08 for closure) 

construct a continuous function / : [a, 6] —► R with only the values 
— 1, -hi such that f(a) = —1, f(b) = 1. 

b) Deduce that such an x must exist, (x is called the supremum , sup 5, 
of the set S. 

c) Deduce that S has an infimum inf S € S such that x £ S => x > inf S. 
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7. All the definitions of Chapters I-V could be made with any other field 
(cf. Exercise 1.1.10) substituted for R, such as the complex or rational 
numbers (though complex-valued inner products take a little care). 
Show that for the rational vector space Q x Q, where Q represents 
the field of rational numbers and all scalars are to be rational num¬ 
bers, with the obvious addition and scalar multiplication the operator 
represented by the matrix 

1 1 
1 2 

has no maximal vectors and no eigenvectors, though the operator on 
R 2 represented by the same matrix has. 

8. a) Show that if 0 < A < 1 and m > n, then 0 < A m < A n < 1. 

b) Deduce that if y = inf { A n | n £ N } (cf. Exercise 6c) then 

S : N —► R : i i-+ A* 

converges to y . 

c) Deduce by 2.02 that T : N — * R : i i—► A ,+1 converges to Ay, hence 
that A y = y. 

d) Deduce that S converges to 0. 
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“There are nine and sixty ways of constructing tribal lays, 
And every single one of them is right” 

Rudyard Kipling 

Throughout this chapter X, X ' will be affine spaces of (finite) dimensions n, 
m respectively, with difference functions d, d! and vector spaces T, T'. 


1. Differentiation 

Differentiating a function / : R —► R gives another function : R —► R, 
whose value at x £ R is (Fig. 1.1a) the slope of the tangent at (x,/(x)) 
to the graph of /. Thus differentiation is an operator on a set of functions 




Fig. 1.1 
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Fig. 1.2 



R —► R. (Actually it is a linear operator and the functions form an infinite¬ 
dimensional vector space, with the usual addition and scalar multiplication.) 
This is a little misleading when we go to higher dimensions. For a map 
g : R 2 —► R, we need two numbers to specify the “tangent” to its graph - 
a plane tangent to a surface, Fig. 1.1b - over each point ( x,y ) G R 2 . Then 
differentiating can no longer give another function of the same kind, R 2 —► R. 
(It looks more like a function R 2 —► R 2 .) Writing this down in coordinates 
involves “partial derivatives” like , and for higher dimensional domain and 
image, rather many of them. To disentangle what these are actually doing, 
let us look at the geometry involved in differentiating maps between affine 
spaces. 

The tangent line in Fig. 1.1a and the tangent plane in Fig. 1.1b are 
“flat approximations” at (x,/(x)) and (x, y,g(x, y)) to the graphs of / and 
g respectively. Differentiating at a point x means substituting for the given 
map / the one whose graph is this flat approximation. Or rather, since this 
would be the affine map approximating /, the linear part of it. This is the 
interesting part - we already know that x goes to /(x), and the value on 
any point combines with the linear part to specify an affine map completely 
(cf. Chapter 2.§2). 

Just as for functions on the real line, we find the derivative of / at x 
by looking at the value of / on (x + Ax), seeing how much this differs from 
/(x), and going to the limit as Ax —► 0. Now however, we have more ways in 
which to move away from x - classified by the tangent vectors at x - and more 
directions in which its image can differ from /(x) - classified by the tangent 
vectors at /(x). Thus we have a map from the tangent space at x to the 
tangent space at /(x). These tangent spaces were defined in Chapter II.1.02. 
In the following definition the usual topology (Chap. VI.§3) is used to provide 
neighbourhoods (Chap. VI.1.08). 

Before coming to technical details, it is worth recalling that by Exer¬ 
cise II.1.3. 

x + (ii + 1%) = (x + ii) 4- ti , 
so we may unambiguously write both as x + 1\ + $ 2 - 
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1.01. Definition. If / : X —► X 1 is a map (not necessarily affine) between 
affine spaces, a derivative of / at x £ X is a linear map 

D x f:T x X^T J(x) X f 

such that for any neighbourhood N in L(T x X;Tf( x )X f ) of the zero linear 
map, there is a neighbourhood N* of 0 E T x X such that if t £ N f then 

d’(f(x + *),/(*)) = <r m {D,f(t) 4- A(i)) 

for some A € N. (that is, if we get close enough to x, the correction term 
to make the flat approximation D x f agree with / is given by an arbitrary 
small - close to zero - map. Thus the correction term A(t) itself, being the 
image of a small vector by a small map, is “second order small” and vanishes 
in the limit.) This is illustrated for a map / : R —► R 2 in Fig. 1.3; notice that 
this time the image, not the graph of / is shown. 

If / has a derivative at x it is unique (Exercise 3) so we shall refer to the 
derivative D x f of / at x, and say / is differentiable at x; if / is differentiable 
whenever it is defined, we just say / is differentiable. 

The derivative (if it exists) is given by the formula (Exercise 3b) 


* 


D x f(t) = 




(The d !^ is necessary to get a tangent vector in T/( ar )X / , not a free vector 
in V.) Or if X 1 is a vector space, using the canonical affine structure the 
formula gives 



Fig. 1.3 
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n fm-n ( f( x + ht )~f( x ) \ 

If X = X 1 = R, as in elementary calculus, any linear map between them 
may be described by its slope, alias the number it multiplies points in X by, 
alias value it takes on the basis vector 1. Leaving out the binding map, as is 
common, the formula thus reduces to 

= li,,, /(« + *)-/(») 

Ax -+ 0 Ax 

relabelling h = h • 1 as Ax. This is the classical expression for the derivative 
in the calculus of one variable. 

The geometrically precise expression * above is thus close in spirit to 
the elementary “x + Ax” approach though it uses a slightly more general 
definition of limit (Exercise la) than that of Chap. VI. It would make a 
simpler definition but for the fact that this limit way exist even when D x f 
does not (Exercise 2), so we have to be a little careful. However we shall 
generally require differentiability as a condition before we start, so from there 
on we can identify the derivative with this limit map, and not worry. 

1.02. Higher Derivatives. If a map / : X X f is differentiable, this gives 
us a map 

Df :X - L(T; T'):x» d' f(x) D x fd~ 

by forgetting to which point each tangent space is attached. If Df is con¬ 
tinuous, we say / is continuously differentiable, or C 1 ; if Df is continuously 
differentiable we say / is C 2 , and so on. If / is C k for all finite i, we say / is 
C°°, or smooth. (Sometimes C° is used for just “continuous”, but whenever 
we say C k without fixing k we will assume k > 1.) 

Notice that D 2 f = D{Df) takes values in L(T; L(T\T )) which is nat¬ 
urally isomorphic (Exercise 4) to L 2 (T;T'), and D k f similarly takes values 
in L k (T\ T) S T* <g> - - - <g> T* ® V (V.1.07, 1.08). In particular when V = R, 
L k (T;T ') ^ by definition (V.1.03). Thus tensor quantities arise 

naturally from differentiation, even if we start with just scalar functions. We 
shall explore this more fully on manifolds (under “covariant differentiation”). 
There one is forced not to forget the distinctness of the points at which the 
tangent spaces are attached, which makes the structure involved clearer. For 
the moment, notice that the derivative at x of / : X —► R gives a linear 
functional T x X — > Tj ( x )R = R whose contours (cf. III.1.02) are exactly the 
local flat approximations in the tangent space to the contours of / in X. 
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Fig. 1.4 


Notice also, if you have previously done this material a different way, 
that the directional derivative of / in the direction of a tangent vector t is 
simply D x f(t ), and in this setting hardly needs a special name. 

1.03. Partial Derivatives. These are just the components of the derivative, 
once we have chosen charts (C and C', say) for X and X f (cf. II. 1.08). This 
fixes bases /?*, for T x X and Tf^X f respectively so the linear map D x f 
is represented by a matrix in the usual way. The partial derivatives are its 
entries, computed as follows. 

We use the chart on X* to represent / as . ,/ m ) where /* is the 

composite 

/’ = e‘oC'o/:I-4l'-+R ra -*R and e‘ : (as 1 ,... ,x m ) x { . 

If P'f( x ) is (ci,... ,c m ), this means that f(x) — x' 0 + f'(x)ci where x' 0 is the 
origin, labelled (0,..., 0), according to the chart C'. If then (3 X = (fci,..., 6 n ) 







<r (/(*),/(» + m,)) \ 


by 1.01. 


The components of this vector in Tj^X' with respect to are exactly 
those of its image by d'j^ in T', with respect to /?'. (That is how f}' } ^ is 
defined from (3'.) So since 


d !(/(*), f(x + hbj)) _ d'(x' + /*(*)<*,*' + fjx + hb^a) 
h h 

_ /*’(» + hbjfa + rjx)ci 
h 
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(/*'(* + hb j) ± f'(x))ci 
h 


and (being linear and hence continuous) preserves limits, we see that if 
D x f(bj) = fj(x)ci we have 



f i (x + hb j )-f i (x) 
h 


These matrix entry functions fj are normally denoted by |^j, or djf* for 
short, where x 1 ,..., x n are the coordinates on X given by the chart C. Notice 
that if we identify the point x E X with its label (a? 1 ,... ,x n ) E R n , as is 
common, the equation above becomes 


n fi, ^ 9f* . . /*(*,. 


,^+h .*”) - /V.«».i») 


If the limit in this equation (written either way) exists, we shall call it 
djf* and “partial derivative” even if D x f does not exist, cf. Exercise 7.1c. 
The matrix [5j/*(x)], or less abbreviatedly 



representing D x f , is the celebrated Jacobian matrix of the map / at x. If 
X' is R itself, then (Z 1 ,..., f m ) collapses effectively to /, so we have partial 
derivatives djf and a matrix 

[di /,&/,...,*•/] • 

If on the other hand X is R, while X 1 is some other space, the derivatives 
are no longer “partial” as x has only one direction to change in. The entries 
are usually given a different notation, ^-(x), or (less ambiguously) ^jj-(x), 
for xGR. The Jacobian matrix takes the form 




As usual, the entries in such a “column matrix” are just the components 
of the vector to which the single basis vector of the domain is carried. In 
this case the single basis vector is the unit vector e x = d^(e i) {e\ being 
the ordered 1-tuple (1), considered as the single standard basis vector for 
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R = R 1 as a real vector space (cf. 1.1.10). Its image, which determines D x f 
completely, is denoted by /*(x). This is the only * we attach to a symbol 
not previously denoting a vector, which should help the reader to remember 
that /*(x), unlike /(x), is a vector. 

Notice again that the vector f*(x) has the same components, relative to 
Pf(x)> 38 map D x f : T x R —► Tf^X', relative to {e x } and This 

can encourage confusion when working in coordinates, particularly when X 1 
is also R and x ) is the “slope of tangent” mentioned at the beginning 
of the chapter. We have three entities - the linear map D x f , the number 
$i(x) € R which is the unique entry in the lxl matrix representing JD*/, 
and the vector f*(x) - which are geometrically quite distinct though “com¬ 
ponentwise” indistinguishable. By with no (x), we will mean the function 
R —► R : £ i-+ x). When we use its value at x we always write it as 

£(*) . not ^ (/(*)) or worse ^ (/(*)) 

as /(x) is just a number, and cannot be differentiated. For a function of x 
like, say 

b 

f :x*-+ J < 7 (x, s)h(x , 0, s ) ds , where g : R 2 R, h : R 3 —► R 

a 

we write the differentiated function as 

6 

j t J g( >*)M ,o,s)ds 

a 

and its value at 0 as 


b 

J t J 9( ,s)h( ,0,s)ds)(0) 

a 

rather than 

b 

^ J g(0,s)h(0,0,s)ds , 

a 

which would be ambiguous. 

Similar rules will apply to the “covariant” differential operator intro¬ 
duced in the next chapter. 

The Jacobian determinant of / at x is the determinant of the Jacobian 
matrix (only defined when m = n). Like the matrix, this depends on choice 
of coordinates (since we could change at one end but not the other, det is 
only invariant for operators (1.3.12)) but whether it is zero does not. This 


PiLte 7^ai4e##ia£liia 



156 


VII. Differentiation and Manifolds 


is very useful. For example, if / is C 1 , then if D x f is non-singular for some 
so its determinant is non-zero, hence by continuity of det and Df it must 
be non-zero in some neighbourhood of x. (Why? If you are not clear, prove 
this in detail by putting the definitions carefully together.) So we have D x f 
an isomorphism not just at x = x 0 but for all x in some neighbourhood of x. 
This “spreading out” from a point to a neighbourhood is a typical, powerful 
trick of differential topology; it gives us in particular the following very useful 
theorem. 

1.04. Theorem (Inverse Function Theorem). If f is C k , for any k, D x f is 
an isomorphism (so we want n = m) if and only if there are neighbourhoods 
N of x, N f of f(x) such that f(N) = N f and we have a local C k inverse 
f*~ :N' -+ N. (That is, f*~of = I N ,fof- = I N >.) 

We leave the proof of this result to the Appendix, since we there erect, 
anyway, machinery which permits a very efficient proof. An understanding 
of the proof is not in any way essential to an understanding of the result, 
which is not easy to doubt once understood. 

1.05. Corollary. If f : X —+ X f is C l and D x f is injective, then there 
is a neighbourhood N of x such that /|jv is injective . (Thus we want 
dimX < dimX* for either to be possible.) 

This is a sufficient but not a necessary condition (Exercise 7). 

Proof Let dim X = n, dim X 1 = m. 

Choose a basis &i,...,6 n for T x X . If D x f is injective (D x f)bi,..., 
( D x f)b n are linearly independent, so we can extend them to a basis 

P = (D x f)b u ...,(D x f)b n ,ci,...,c m _„ 
for Tj( x )X' . Define an affine map 

A : X' -> R" : (/(*) + a\D x f)bi + Vc,) ~ (a 1 ,... ,a") 

using the chart on X 1 induced by /? and the choice of f(x) as origin. Then 
clearly D x (Aof) is injective too, being D x f composed with the linear part of 
A which takes non-zero image vectors of D x f to non-zero vectors by construc¬ 
tion. Hence it is an isomorphism, since dim(T A ^^R n ) = dim T x X. By the 
Theorem there exist neighbourhoods N, N f of x, A(f (x)) and <f> \ N* N 
such that <l> o (A o f\ N ) = I N . That is (<f> o A) o (/|w) = In, so /|^ is injec¬ 
tive. □ 

Both these results are local, asserting things only on neighbourhoods 
which may be very small, not global They do not assert that / is invertible 
or injective as a whole map, even if D x f is invertible or injective for all x. (For 


Tilths*. 7^aiAe##ia£liia 



1. Differentiation 


157 


example, the map R 2 —► R 2 taking (x, y) to the point - using complex labels - 
e *+*y jg locally invertible and injective everywhere, but takes infinitely many 
points to every point in R 2 except (0,0). Work out what is happening in this 
example if you are not familiar with it; it is illuminating.) 

Both results amount to saying that the linear approximation D x f is 
worth making. That is, since D x f is supposed to be “arbitrarily close to” / 
in a sufficiently small neighbourhood of x, the properties of being injective 
or an isomorphism carry over. When the algebraic condition of injectivity on 
D x f fails, more elaborate approximations (Taylor expansions) are needed for 
a good local description of /. Recent results give straightforward algebraic 
criteria for some Ar, and guarantee that the approximation is locally perfect 
up to a smooth change of coordinates. For an elementary introduction, see 
[Poston and Stewart]. 


Exercises VII. 1 

1. a) Suppose we have Hausdorff topological spaces X, Y and any map (not 

necessarily continuous or everywhere defined) / : X —► Y. Now define 

limj._*p (/(x)) = q if and only if for any neighbourhood N(q) 
of q we can find a neighbourhood N(p) of p such that, if 
x £ N(p) and /(x) is defined, then /(x) £ N(q). Draw a 
picture! 

Show that if x,- is a sequence in X with lim,*_oo x,* = p, /(x,*) defined 
for infinitely many i £ N, and lim X -+ P (f(xj) exists, then 

.lim f(xi) = f(p) , 

J—MX) 

if /(p) is defined. 

b) If X is the set of natural numbers 1,2,3,... together with one extra 
element which we label oo (it can be anything - for instance this book) 
find a topology on X which makes Definition VI.2.01 a special case of 
the one above. 

2. Consider / : R 2 —► R such that 

/(«■») = ( w “ p ,<) ) 

^ 0 otherwise. 

a) Draw a picture of /. 

b) Show that for any vector v £ R 2 , lim/^o exists and is zero, 

using the definition in Exercise la) of “limit” and the usual topology 
on R 2 and R. 
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c) Show that if v,* = (},$) we have lim^oo t;,* = 0, but that in the 
notation of 1.01 if x = 0 we have df(f(x + Vi) y f(x)) = j. 

d) Find a neighbourhood N of the zero map R 2 —► R such that 

A E N => A( ,, p ) ^ t • 

e) Deduce that / has no derivative at (0,0). (If we put f(x,y) = 1 for 
x 2 < y < 3x 2 , 0 otherwise, / would still have all partial derivatives 
djf( 0,0) without even being continuous at (0,0). We need the more 
complicated function above as a counterexample later.) 

3. Show that if a map / : X —► X 1 between affine spaces has a derivative 
D x f at x e X, 

a) D X F is unique (so if a linear map D x f also satisfies the definition, 
D'J = D t f) 

b) D x f(t) = lim ^ ^. Note that as h -* 0 we 

are forcing the linear map A in the Definition 1.01 towards the zero 
map. 

(Hint: to get a quantitative grip on the neighbourhoods involved and 
make possible a proof by epsilontics, choose norms arbitrarily - any 
norm will give, by Exercise VI.4.8, the usual topology in finite di¬ 
mensions - on T and T", take the corresponding norm (V.4.01) on 
L(T\T') y and express the limits in these terms.) 

c) Construct an example in the style of Exercise 2 to show that Theo¬ 
rems 1.04, 1.05 become false if we substitute D x f y where 

for D x f. (Thus we need the existence of D x f y not just D x f .) 

d) If / is differentiable at x, it is continuous at x. (Hint: otherwise not 
even D x f could exist.) 

e) If / is an affine map, then D x f is the linear part of /. (In particu¬ 
lar, we may treat a linear map as being its own derivative: cf. Exer¬ 
cise II.3.8.) 

4. If A is a linear map T —► L(T ; T'), define 

A':TxT^T':(x, y )~(A(x))y 

and prove: 

a) A ; is bilinear. 

b) The map L(T,L(T;T f )) —+ L 2 {T\T') : A A! is a vector space 
isomorphism. 
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c) Similarly, prove that L(T\ L(T ;...; L(T\ T')...)) S L k (T\T). 

5. a) If functions /, g defined in a neighbourhood of x in an affine space X, 
taking values in a vector space Y, are differentiable at x, then so is 
/ + g, and we have, for any a £ R, 


D x (f + g) = D*/ + £>*</ , D,(a/) = a(DJ) 

as linear maps. 

Thus D is a linear map from the (infinite-dimensional) vector space of 
differentiable maps X —► Y to the space of maps X —► L(T; Y), where 
T is the vector space of X. 

b) If /, g are functions X —► R, X affine, show from the definitions 
that D x (fg) = {D x f)g + f(D x g), where (fg)(x) = f(x)g(x ), treating 
D x f ) D x g and D x (fg) as taking values in R. (Insert the appropriate 
freeing maps if desired.) In other terms 


6 . 




This fact, in one notation or another, should have been familiar from 
school onwards. Its usual name is Leibniz’s rule , though some books, 
for example [Misner, Thorne and Wheeler] call it and its generalisa¬ 
tions to tensors the chain rule - a name we reserve to its more usual 
meaning (Exercise 6). 

Show that if the maps / :X-^Y,y:Y —> X between affine spaces 
are differentiable at x E X, f(x) E Y respectively, then g o f is differ¬ 
entiable at x and 


D*(9 o /) = (D } ( x) g) o ( D t f) . 


(This is known as the chain rule for differentiation.) 

Deduce that if / and g are C k , then so is g o /. 

7. a) Use the function x x 3 to show that the “if” of 1.05 cannot be 
strengthened to “if and only if '. 
b) Show that 


/:R-+R:ih*( ^ x2 s * n ( *) + * if * # 0 

I 0 if* = 0 


is differentiable everywhere (draw it!) but not C 1 . 
c) Show that / has no inverse in any neighbourhood of 0, though Dq f is 
an isomorphism. (This illustrates why C 1 is so much more powerful a 
condition than just “differentiable”: the difference is much more than 
between C l and C°°.) 
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2. Manifolds 

In constructing charts on affine spaces (II.1.08) we remarked that on, for 
example, the earth we could not do so globally . (That is, all over the globe - 
hence the word. In general we use it to mean “all over the manifold we are 
considering”, which may for instance be the whole of spacetime.) We can 
however do it locally. Around any point on the earth we have no trouble in 
drawing charts of the immediate locality - it is only when we try to cover 
the whole earth that we are forced into complications like Fig. 2.1. 

The same applies to any smooth closed surface in R 3 ; locally we can 
choose coordinates and make it look like a piece of R 2 (Fig. 2.2), globally 




Fig. 2.2 
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not. Now all the definitions of the last section depended only on having a 
function / defined in some neighbourhood of the point x we were interested 
in, since in the course of taking limits we eventually disregarded everything 
outside any particular neighbourhood. (At this point, strictly speaking, we 
should write out all the definitions again with / defined on an open set in an 
affine space, instead of the whole space. What we shall actually do is to talk 
as though this rewriting has been done.) So a local resemblance to an affine 
space is all we need to set up the differential calculus. The existence of such 
a resemblance is exactly what we require in defining a manifold. 

2.01. Definition. A C k manifold modelled on an affine space X (sometimes, 
in particular, R n ) is a Hausdorff topological space M together with a collec¬ 
tion of open sets { U a | a 6 A } in M and corresponding maps <j> a : U a —«► X f 
such that 

Mi) Ua t A Ua = M. 

Mii) Each <j> a defines a homeomorphism U a —► <j> a (U a )- 

Miii) If U a nU b ^ 0, then the composites <t> a o on the sets 

06(t^a), <t>a(Ub) on which they are defined (Fig. 2.3), are C k . (We 
deduce from M ii) they are homeomorphisms; we are requiring them 
to be differentiable k times as well.) 

The pairs ( U ai (/>a ) are called charts on M, and the set {(U ai <f> a ) | 
a G A } of all of them is an atlas. Exercise 1 is concerned with some specific 
examples of manifolds and atlases. 

A new chart ({/,<£), beyond those we have specified in defining M, is 
called admissible of for all a E A, the maps <j> o and </> a o are C k 
whenever they are defined. M is not changed in any significant way if we 
enlarge the family { ( U a ,<f>a ) | a € A } by adding admissible charts, and we 
shall feel free to do so. 

It will often be convenient, for a particular x E M, to consider a chart 
<t> :U —► R n with <f>(x) = (0,..., 0) (Exercise lg); a chart around x will always 
mean this, unless otherwise stated. 

To shorten the statements of definitions and theorems we shall generally 
confine ourselves to C°° , or smooth , manifolds; very little is lost by this. By 
“manifold” we mean “smooth manifold” unless otherwise stated. 

The dimension dim(M) of M is the dimension of the affine space it 
is modelled on. We often call M an n-manifold if we want to specify its 
dimension. (Thus, a 2-manifold, or surface , is a manifold modelled on the 
plane. It need not be the “surface oP anything.) 

The axioms M i) - M iii) are natural enough; M i) just says that no point 
in M is “uncharted”, Mii) that the charts are topologically uncomplicated, 
relative to the topology on Af, and Miii) that they are differentially nice 
(C k ) relative to each other. We cannot ask that they be C k individually, 
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because we do not yet have a notion of differentiation of maps defined on 
M; the one we are about to define depends precisely on the compatibility 
of the charts, which give a “differential structure” on M. (It is possible 
for a given topological manifold - something satisfying just M i) and M ii) - 
to have many essentially different differential structures. For example the 
seven-dimensional sphere S 7 has 28, and the thirty-one dimensional sphere 
S 31 has over 16 million. Certain topological manifolds admit none at all.) 
On the other hand we must exclude situations like Fig. 2.4 in which a map 
/ : R —► M gives a differentiable map R —► X when composed with one 
chart, <£&, but not when composed with another. If that kind of thing can 
happen we cannot hope to give differentiability of / itself a meaning in¬ 
dependent of our choice of chart. Having in the affine setting separated 
differentiability from choice of charts so effectively, this would be a shame. 
Axiom Miii) is exactly sufficient to stop it happening. (Cf. Exercise 2, one 
of the most essential exercises in this book. Do 2a and 2b or at least be 
quite sure you understand what they say, before reading further.) This lets 
us make 

2.02. Definition. A map / : M —► N between smooth manifolds is differ¬ 
entiable , (respectively C k ) at x £ M if for some charts ( U ,^) on M, (V, VO 
on N , with x € (7, f(x) € V , the map ^ 0 / ° (Fig. 2.5) is differen¬ 
tiable (respectively C k ) at (Exercise 2 guarantees that this definition 
is independent of our choice of charts.) 

A homeomorphism / : M —► N between C k manifolds is a C k diffeo - 
morphism if both / and are C k . (Note if / : R —► R has /(x) = x 3 , 
then / is a homeomorphism and C 00 , but /*“ is not differentiable at 0. So 
the C k condition on f*~ does not follow from / being C k .) If there is a 
diffeomorphism between two manifolds they are diffeomorphic. 

2.03. Tangent Spaces. Now, differentiability of / ought reasonably to mean 
the existence of a derivative for / itself, not just for various maps %l>o f o<f>*~ , 
and so it will - once we have said what a derivative is now supposed to be. 

Clearly, we shall want D x f to be, as before, the linear part of a flat 
approximation to / at x. For the linearity to be definable, D x f must therefore 
be a map between vector spaces, attached as before to the points x and /(x). 
Thus we need to attach tangent spaces to points in a manifold, like the ones 
we have been using attached to points in an affine space. 

As with tribal lays, there are very many approaches to constructing 
tangent spaces, which are all right ways. That is, they all give naturally 
isomorphic results (in the strong, technical sense of the word “natural”) and 
they all illuminate one or another aspect of what is going on. We shall sketch 
the most geometrical, and give as Exercise 3 a particular formal construction 
chosen (i) because it is the one that requires no more machinery than we 
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Fig. 2.5 


have already to hand, and (ii) because it is the rigorous replacement of the 
traditional “definition” (Exercise 4), which a physicist must be at home with 
to be able to understand his more unreconstructed colleagues and older books. 

It is a (non-trivial) fact that any finite-dimensional manifold can be 
mapped smoothly and injectively into R n , for some sufficiently high n. Es¬ 
tablishing the lowest value of n that is sufficient for a given manifold is one of 
the major preoccupations of the differential topologists. For example, among 
2-manifolds the sphere and torus can sit nicely in R 3 , but the Klein bottle 
(Fig. 2.6) always has a self-intersection in three dimensions and cannot be 
mapped continuously and injectively into a less that four-dimensional Eu¬ 
clidean space. (This object, by the way, was orginally in unfrivolous 19th 
Century fashion the Kleinsche Flache - Klein surface - but this was mis¬ 
taken by an English translator for Kleinsche Flasche - Klein bottle - and 
the error took hold so strongly that now the Germans too call it a Flasche.) 
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Fig. 2,6 
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In general, if dim(M) = m, we may need n up to 2m + 1, but never more, 
if we are concerned only with the differential structure on M (and not, for 
instance, with a metric as well); this is the Whitney Embedding Theorem, 
see [Guillemin and Pollack], All we need here is that for some n, M can be 
thought of as a subset of R n in a nice way (like the manifolds of Exercise 1) 
because we can then define the tangent space to M at x E M to be the affine 
subspace T X M of R n that is geometrically tangent to M at x. Or rather, since 
we want a vector space, we use the tangent space T X (T X M) in the affine space 
sense we already have. We shall call this T X M. By Exercise 5 it is canonically 
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Fig. 2.8 


isomorphic to the space defined in detail in Exercise 3 (and hence, also, inde¬ 
pendent of the embedding) and gives us a nice picture of it. We shall often 
us this kind of picture in drawing illustrations, though the definition in Exer¬ 
cise 3 is more convenient for formal proofs and calculations. This is just like 
the interplay between geometric thinking and algebraic proof we have used 
for vector spaces. We do not give the strict definition of “embedding” here, 
which involves some technicalities to disallow pictures like Fig. 2.8, since we 
shall use the embedded picture throughout for illustration only, not proofs. 

We can also embed manifolds in each other. If M is embedded in N 
we may call it a submanifold of N. If then dimAf = dimJV — 1, so that 
each tangent space T X M is a hyperplane in T X N (1.1.09), M is a hypersurface 
in N . 

We can now once again interpret differentiability at x of a map /, now 
between manifolds / : M — ► N, as the existence of a derivative - that is, a 
linear map 

D x f : T X M - T f(x) N 

which locally approximates /. The formal details are in Exercise 6. The 
idea is simply to look at regions of M and N around x and f{x) which are 
small enough to mistake via charts for pieces of affine space, and transfer to 
manifolds the affine space notion of derivative. In a similar way we can define 
higher derivatives, with D k f € L k {T x M\Tj^N). 

Notice that, unlike the affine space situation where each T x X had a 
canonical isomorphism d x with the vector space T of X, for a manifold M 
modelled on X and x E M there is no such natural choice of isomorphism 
T X M —► T. Any chart (f/, ^) with x E U gives an isomorphism 

d<f>( x ) o D x <t> : T X M —► T^ x )X —► T , 

but any other isomorphism T X M —► T can equally be realised by an admissible 
chart. (Why? Prove it by composing an affine map with <j>.) Thus we 
cannot reasonable identify T t M with T, any more than we identify T with 
its dual T*. In consequence, we cannot identify the tangent space T X M, 
T y M at different points in M with each other. That is, we cannot “forget 
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to which point each tangent space is attached.” Further implications of this 
will appear in the next section. 


Exercises VII. 2 

1. a) If S 2 = { (z 1 ,* 2 ,* 3 ) € R 3 | (x 1 ) 2 + (x 2 ) 2 + (x 3 ) 2 = 1}, and 

{/, + = {(*V,* 3 )eS 2 >0}, * = i,2,3 

U i - = {(x\x 2 ,x 3 )eS 2 \x i <0} , * = 1,2,3 

are the six open hemispheres obtained by slicing through S 2 with 
coordinate planes (draw them!) show that the six “flattening maps”, 
such as 

<Ai+ : Ui+ -* R 2 : (a; 1 , x 2 , x 3 ) (x 2 ,x 3 ) 

and 

<t> 2 — : —► R 2 : (x*,x 2 ,x 3 ) »-► (x\x 3 ) 

constitute an atlas for S 2 making it a smooth manifold. 

b) Show that the surface {sc | ac • as = 1 } of Fig. IV.2.3b is a 2-manifold, 
with one chart for each component. Show that { * | sc • sc = —1} of 
Fig. IV.2.3c is also a 2-manifold, by finding an atlas for it. Generalise 
this to show that in any metric vector space, {sc|sc*sc = a}is 
a manifold whenever a ^ 0. Find atlases making the following into 
smooth manifolds: 

c) The sets of positions of a unit rod in the plane, and in R 3 . 

d) The sets of positions of a unit circle in R 3 . 

e) The set of all possible circles in R 3 . 

f) The set of ellipses in R 3 with one focus at (0,0,0). 

N.B.: None of these can be covered by a single chart (though to 
prove this rigorously is non-trivial). Case f is the first abstract (non- 
embedded) manifold ever considered: the space of Keplerian orbits 
around a body centered at the origin. Space engineers use various 
atlases, and regret deeply the lack of a single, smooth, unredundant 
complete way to define the coordinates or “elements” of an orbit. 

g) Show that for a chart <j> : U —► R n and x E f7, there is an affine 
map A : R n —+ R n such that A o <j> is an admissible chart taking x to 
(0,...,0). 

2. Let M be a C k manifold modelled on an affine space X, and (U a , <£ a ), 
{UbAb) charts on M with U a fl Ub ^ 0. 

a) If U is an open subset of an affine space Y, and / a map U —► Af, use 
Exercise 1.6 and the Inverse Function Theorem (1.04) to show that 
for any x 6 U with f(x) E U a fl Ub (so that both sides are defined) we 
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have, for all j = 1,..., k } 

<j> a o / is 0 at x <=> <f>b o / is 0 at x . 

b) If WP is an open subset of M , and / is a map from W to an affine 

space Z, show that for any x E W fl ( U a H Ub) and for all j = 1,..., k , 

/ ofo is (7-* at a: 4=> / o <j>^ is 0 at x . 

c) Deduce that in the situation of Definition 1.02, if (U *,^') and (V 7 , V> ; ) 
are also charts on Af, iV, with x E £/', /(x) E V', then 

\l>' o f o <j>' is C* at x <=>> ^o/o^isC^atx. 

3. Let (f7, ^), (J7 7 , ^') be charts on a smooth manifold M , modelled on an 
affine space X with vector space T> u£U HU 1 , and t,t' E T. Define 
the relation ~ by 

a) Show that ~ is an equivalence relation on the set of such triples. 

b) Show that if {U^t) ~ and (U^») ~ (t/',^',s'), then 

and 

(£/, <j> } ta) ~ ([/', 0', i'a) for all a E R. 

Hence we have a well defined addition and scalar multiplication on 
the set T U M of ~ equivalence classes, making it a vector space. Then 
T U M is the tangent space to M at u and the ~ equivalence classes are 
tangent vectors to M at u. 

4. If X in Exercise 3 is R n , we may write <j> } $ in the form <j>(u) = 

(x^ti),... ,x n (u)), = (x /1 (u),...,x /n (u)). By a standard abuse 

of language, if (x 1 ,...,®* 1 ) = a, we also write </>'(x)) = 
(x 7l (a),... ,x ,n (a)). Then if t = (f 1 ,... ,t n ), t f = (< ;1 ,... ,t m ), show 
that 

(*) ~ *=> i tx = 

by applying 1.03. 

The traditional description of a vector in a differential context 
is “a set of n numbers that transform according to Two sets of 
n numbers, associated with two different charts, represent the same 
vector if they are related by the formula *. 
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(Warning: in the really confused books, you are told that a vector is 
a set of functions that transform according to *. What they mean is 
a vector field , which we come to shortly. Sometimes they say “quan¬ 
tities”, which is at least vague enough not to be wrong.) 

5. a) If i: M —► R n is the inclusion used in 2.03, use Theorem 1.04 to show 

that if ([/,<£) and (f/',^) are charts on M with x £ U PI [/', then 

Thus each tangent vector, in the sense of Exercise 4, is uniquely rep¬ 
resented by a single vector in T x R n . The image of any member of it 
in T x R n is the same. 

b) Show that these representing vectors form a subspace, Y say, of T x R n , 
and that the function, from the version of T X M defined in Exercise 3 
to Y, taking each tangent vector to its representative in Y, is a vector 
space isomorphism. 

c) Give a precise definition for the geometrical notion of “tangent affine 
subspace” used in 2.03 (for instance as the union of the set of straight 
lines in R n tangent at x to curves in Af, which defines an affine sub¬ 
space y, cf. VIII.§1), and show that Y coincides with T X M as defined 
in 2.03. 

6. a) Show that if / : Af —► AT is differentiable at u £ U H C/ ; , and ( V , VO is 

a chart on N with f(u) £ V , then 

=>• ■D*(«)(V’ ° / ° °f° )*' 

=► (V, D+ (U) W o / o **-)*) ~ (V, rl>, D^ (U )(V> o /o'-)0 
so that / induces a well defined map 

D u f:T u M-+T f(u) N 

taking the ~ equivalence class of ([/, <j> } t) to that of (V, Vs o 

f o and prove that D u f is linear. 

b) We can now take the derivative at u £ U of the chart <j> : U —* X 
since U (being open in M) and X now both have differential struc¬ 
tures. Show that D x <t> just takes each t £ T X M to its representative 
in T^^ x ^X . 

7. Suppose that / : M —* M 1 is a C k map between manifolds and that 
for some x £ M the derivative D x f is injective. Let dimAf = m < 
n = dim Af'. 

a) Deduce from 1.05 that x has a neighbourhood N such that f \n is 
injective. 
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b) Construct a chart <f >: U —► R n around f(x) such that 


<j>(f(N)) = 4>(U) n{(*V. .,*")! s m+1 = x m+2 = • • • = s n = 0 } . 


(For A : R n —► R m as constructed in proving 1.05, use A as a projection 
Rn pm an( j move an ( n _ m)-dimensional subspace through each 
f(y) to a new origin.) 

c) Show that if B : R n —► R m : (a? 1 ,... ,x m ,... ,£ n ) »-► (a?- 1 ,... , z m ), then 
B o <l> o f is a chart map admissible on Af'. 

d) Deduce that x and f(x) have C k charts around them which give / the 
local coordinate form 


8. Suppose / : M —► Af' is a C k map between manifolds and that for 
some x 6 M the derivative D x f is surjective with dim M = m > n = 
dim Af ; . 

a) Show similarly to Exercise 7 that x and f(x) have charts around them 
giving / the local form 


(x 1 ,..., * m -", x m -" +1 ,..., x m ) * (* m -" +1 ,..., x m ) . 


(Hint: construct for some neighbourhood N of x a function F : N —► 
M ; x R* : y (/(y), ?) such that D X F is bijective, and use 1.04.) 

b) Deduce that if for some p 6 M 1 every x € f*~(p) has D x f surjective, 
then a chart giving coordinates (a: 1 ,..., ar m “ n ) of /*"(p) may be con¬ 
structed around each x £ /^(p). Prove that these make /*”(p) into a 
C k manifold by satisfying Mi) - Miii). 

c) Deduce in particular that if / : R n — ► R is C°° and has D x f ^ 0 for 
every x £ f*~( 1), then /*“(1) has the structure of a smooth (n — 1)- 
manifold. Construct such functions / to deduce with less work than 
in Exercise 1 that the sets there given in a,6,c are manifolds. 


3. Bundles and Fields 

From elementary vector analysis, the idea is familiar of a “vector field”. That 
is, a choice of vector at each point in R 3 , varying smoothly from point to 
point. Transferred to general manifolds, this will mean a choice of tangent 
vector, obviously; but how do we interpret smoothness? Plainly, it must be 
as smoothness of the map (x *-*• chosen vector at x). this means that we 
need a differential structure on the set of all tangent vectors Uj 
denoted by TM. (Each subset T X M of course has already a differential 
structure, being a finite-dimensional vector - and hence affine - space.) We 
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Fig. 3.1 


can do this in the embedded picture (Fig. 3.1), labelling the vectors by their 
beginning and end points (x, x + t) and considering the differentiability of 
the vector field in terms of the resulting map M —► R n x R n = R 2n . This is 
intuitive, but cumbersome; it is more convenient to handle TM directly, via 
the construction of the tangent spaces by means of the charts. The results 
are essentially the same. 

One minor technical point is needed here. In the last paragraph, in order 
to differentiate a map taking values in R n x R n we used the obvious identi¬ 
fication with R 2n , which is an affine space, so that our definitions applied. 
There is an equally obvious affine space (respectively, vector space) structure 
on the set-theoretical product X x Y of any two affine (respectively, vector) 
spaces X and Y . (We could have introduced these in Chapters I and II, but 
not so easily have explained their usefulness.) The details are collected in 
Exercise 1. 

3.01. Theorem. If M is an n-manifold modelled on an affine space X, with 
atlas {(U a ,<l>a) I a€^l} we set 

TM= \ J T X M , TU a = [J T X M C TM . 

xeM x£U a 

Then, { (T{7 a ,D^ a ) | a £ A } is an atlas making TM a 2 n-manifold modelled 
on X x X, where D<j> a : TU a —► TR n is defined by D<j> a \T x M = &x<i>a- (We 
are just taking all the derivatives at once to make one big map.) 

Proof. (The diagram must regrettably be for M a 1-manifold, since otherwise 
TM is at least four-dimensional! Recall Fig. II.1.4.) 

We must confirm the axioms M i) - M ii) of Definition 2.01; dim(TAf) = 
2n by Exercise lb,c. 
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Fig. 3.2 


Mi) U aeAFUa) = IU (Ur6C/. T * M ) = l= TM. 

Mii) We fix the topology on TAf, similarly to VI.3.01 (the weak topol¬ 
ogy) by taking as open the smallest family of sets that makes all 
the D<f> a ’s continuous. (This we must have, if we hope for homeo- 
morphisms!) That is, we take the family of finite intersections, and 
arbitrary unions of these, of sets of form (D<f> a )*~(W), for W open 
in X x X, (cf. Exercise le). 

Now we know that each D<j> a is injective for is bijective: If t , t ' are in 
the tangent spaces at x / x ; , say, they are mapped by D<j> a into the disjoint 
tangent spaces at </> a (x) / <t>a(x')- If they are in the same tangent space at 
x say, use the fact that D<t> a (x)\r x M = Dx<t>a is an isomorphism and hence 
injective. Similarly D<f> a is surjective. We have just picked the topology on 
TM to make each D<j> a continuous, so to prove M ii) it remains only to check 
that each (J30 a )^, which exists since D<j> a is injective, is continuous. This 
means by Definition VI.1.09 that 

U open in TM => ((D<l> a )*~)*~(U) open in X x X . 

(Note that ((D<f>a)*~)*~(U) = D<t> a (U fl U a ), since D<j> a need not be defined 
on all of f7.) 

This follows if we prove it for U of the form (D^6)*“(W), with W open 
in X x X, since any open set, by definition, is a union of finite intersections 
of such (Exercise 3a). Hence we want 

W open in X x X ((D^ a )*-)*~ ((D<f>b)*~(W)) open in X x X . 

That is, W open in X x X => (Dfa o (D<f> a )*~)*~(W) open in X x X 
(cf. Exercise 3b). Hence we want D<j>bo(D<f ) a ) 4 ~ m , equal by the Chain Rule to 
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D(<f>b o<£7), to be continuous. But this follows at once from the requirement 
on M that each <f>b o <j>^ be continuously differentiable, so we are done. 

(We have dropped the temporary expedient (1.02) of forgetting where 
tangent spaces are attached, so that D(<t>b o <f )£~) is properly seen as a map 
<j> a (Ub ) xI-4 <i>n(U a ) x X, linear on tangent spaces T x = {x} x X, not 
a map <f> a (Ub) —► L(T;T). It is immediate that continuity in this view is 
equivalent to the earlier definition. Notice again the crucial nature of the 
continuity requirement (cf. Exercise 1.7, Exercise 3c). This is why we have 
not even given a name to things satisfying M i) and M ii) but with the <f>b°<f>lT 
only differentiable, since in the absence of this theorem they are of little 
use beyond what comes from satisfying Mi) and Mii) with no differential 
conditions at all.) 

M iii) Define a map 

A : X x X X x T : (*,y) h* (s,d(s,y)) . 

This is an affine map, hence its derivative everywhere is just its linear part A; 
trivially, it is C°°. It nicely disentangles “affine space directions” in X x X, 
seen as a union of tangent spaces, from “tangent vector directions” (Fig. 3.3), 
so simplifying the algebra. The point is that in Fig. 3.3(a) a “horizontal” 
movement changes the vector, for instance from zero to non-zero, but not in 
Fig- 3.3(b). 

We know by Exercise If that the derivative of 

A o (D(+ t o £-)) o A*~ : <t> a (Ub) x T -> fa{U a ) x T 

(q,t) !-*• {{<i>nO<l>^)q,b q (<i> h o<t>^)t) 

= (p, s) for short 

(why does this map have the expression given?) at (<f,i) is exactly 

D q (<j>b o <j >£") © Dt(D q (<f>b o <j >£”)) • T q X © Tt(T q X) —► T P X © T.(T P X) . 
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Since D q (<j> b o <j>^) is linear, we can identify this with 

D q{<t>b 0 K) © Dgifa ° <t >~): © T q X -+ T P X © T P X . 

We then have the Chain Rule that 

° ( D (4>a))^) = D( qit) (D(<f> h o <£7)) 

.= O (A o (<f> b o ^7) o 4*-) o A) 

- AT O (D q (<j> b 0^7)® D q (<t> t o o A 

so that the differentiability, and its continuity, of D{<j> b ) o (D(<f> a ))~ follows 
from that of <f> b o <f >^; similarly for higher derivatives. (Recall that we are 
assuming all manifolds to be C°° unless otherwise stated.) □ 

3.02. Language. Notice the very different roles played in this proof by 
D q and D: For any map / : M -*• N between manifold we define Df : 
TM -* TN by the requirement that for any q e M, D q f : T q M —*• T } ^ q )N 
is just the restriction of Df to T q M. However, D q f is a linear map between 
vector spaces while Df is a map between manifolds, which for the case above 
we had to prove was again differentiable. (In general this does not follow from 
the differentiability of /. In fact / is C 1 exactly when Df exists and is con¬ 
tinuous, C 2 when Df is C 1 , and so on - which gives a cleaner definition. The 
proof that this is equivalent to the chart definition (2.02) is purely a mechan¬ 
ical check.) We shall therefore call them by different names: Whereas the 
derivative of /, at a point q € M, is the linear map D q f approximating / at 
q, the differential of / is the map Df between the manifolds TM and TN. 
In this we are neither following nor departing from standard usage, because 
there is none — various authors use the words variously. So do not expect the 
distinction always to be made in the same way elsewhere. 

One special case is worth a special symbol. If / is a real valued function, 
then D q f is a linear map T q M —* Tj( q )R. Composed with the isomorphism 
4f(«) • this gives us a linear map T q M —* R, whose geometrical 

meaning is as explained in 1.02 and Fig. 1.4 for affine spaces. The map 
TM —* R obtained by combining all of these we denote by df (this usage is 
standard) as distinct from the map Df : TM —*■ TR. The map df is highly 
important: it is often called the gradient of /. We shall extend our use of the 
word “differential” to include df also by a mild abuse of language. Notice 
that although (df) q is a (covariant) vector at q, we do not make it boldface. 
We have chosen this inconsistency to avoid confusion with the d, d x etc. that 
we use for affine structures. 

3.03. Definition. The tangent bundle on a manifold M is the manifold TM 
together with the map (trivially a C°° map) taking each tangent vector down 
to its point of attachment: 
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J n 

M __ m ._. 

p q U Q 

Fig. 3.4 


TM 

n | where n(T q M) = { 9 } . 

M 

Locally, 77 looks like the projection of a product, as in Fig. 3.4; globally it 
may do so, as in Fig. 3.2, but it often does not, as we shall see. 

The bundle of tensors of type (*) (cf. V.1.10) on M is defined by taking 
the tensor product 

C T X M)\ = T X M <g> ■■■®T X M® (T X M)* ® • • • <g>( T X M )* 

v . . V ...V ■ ' .— ' 

k times h times 

over each point x £ M, and joining the separate spaces up into a manifold 
called T*M. (The formal details of this construction are technically laborious 
but contain no ideas not in the proof of 3.01. The main challenge is to find a 
sufficiently succinct notation to get formulae for the maps involved that do 
not spread over more than two lines. This is a highly worthwhile exercise, and 
left as such in Exercise 4.) The bundle is then T*A7 together with the map 
77* taking each tensor in (T X M)\ to x, as for the tangent vectors above. (We 
may for brevity refer to the bundle as simply T*Af, not the pair (T*Af, 77*), 
but the map is always to be understood as part of the structure.) Tensor 
bundles of type (*/» m n ), etc., are defined similarly. 

We make similar abbreviations to those in Chap. V.1.10, of T^M to ThM , 
TqM to T*M, TiM to T*M, (! T X M )* to r;M, 77? to 77 etc. By convention 
TqM is just the product manifold M x R together with the projection 77® : 
TqM —> M : (x,r) »-+ x as bundle map, so that (77o)"”(x) = ( T x M)q . We 
have a natural bijection / i—► (x (x, f(x))) between functions M —► R and 
(o)-tensor fields M —► Tq and we generally identify the two ideas. 

For any of these bundles the vector space (T X M)* (sometimes denoted by 
(T^M) X , according to taste), for a particular x E M, is the fibre at, or over, 
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x . This word is suggested by Fig. 3.4 where the fibres are one-dimensional, 
but applies regardless of dimension. 

3.04. Definition. A C r tensor field of type (*) on a manifold M is a C r 
section of the bundle TjfM\ that is, a (7 map v : M —» TjfM such that 
n k h o v = /jvf, (Fig. 3.5). This is precisely “choice of a tensor at each point 
of M” in the manner of the beginning of this section. (We shall be concerned 
so invariably with C 00 , or smooth , fields that we shall take any tensor field 
to be C°° unless otherwise indicated, as we do for manifolds.) 

We shall denote the (oo-dimensional) vector space of all (J)-tensor fields 
on M by 7J*M, omitting 0’s etc., as in 3.03. 

Notice that we use a symbol of the same kind (bold lower case, like v) 
for a tensor field as for a single tensor. The context should make clear what is 
meant, even when the overwhelming weight of tradition makes us abbreviate 
“tensor field” to just “tensor” in particular cases, such as the Einstein tensor. 
We shall do so as rarely as we can help, but with no further apology. 

Sometimes, particularly when the value of v, at p € M is a function 
(for instance v of type (J), v(x) : TM —► R, and v of type (§), v(x) : 
T X M x T X M —► R, are linear and bilinear maps respectively) we want an 
alternative way to write t?(x). This is to avoid expressions like v(z)(t) or 
v(x)(s,t). We introduce the notation v x for v(x) to achieve this. If v has 
a complicated expression like a^b^f 1 we may write a?(®)^.(®)(/*)® or P u ^ 
brackets round the lot, writing 

We deviate from lower case in a similar way to our previous usage for 
single tensors (where, for example, a bilinear form in L 2 (X; R) = was 
denoted by F (cf. IV.1.01, V.1.03, V.1.10)), and from boldface in an instance 
discussed in the next section. 

In particular we have:- 

A contravariant vector field is a section of TM. That is a smooth choice 
of a tangent vector at each point. This is illustrated in the embedded case 



| ¥ Jn 

m _ y . _z 

Fig. 3.5 
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by Fig. 3.6. Contravariant vector fields are sometimes called tangent vector 
fields. 

A covariant vector field is a section of T*M . In particular the gradient 
df of a smooth function / is always a covariant vector field (cf. 3.02), but the 
converse is untrue. A linear functional / in T*M often called a cotangent 
vector at x and covariant vector fields are sometimes called cotangent vector 
fields or one-forms. 

If we have a (J)-tensor field t and an (J)-tensor field s on M, then in each 
(T£M) x ® ( TjM) x = {T k h i jM) x £ (T k ^M) x we have an element t x ®s x . 
These define sections x »-+ t x ® s x of the ( k h*j) and (££*•)-tensor bundles 
which - subject to the trivial check of being C°° if t and 8 are - give us 
( k h l j) and (£+*)-tensor fields on M. In particular, if t is of type (§), that is 
just a function t : M —► R then t ® s is just ts with (ts) x the scalar multiple 
t(x)s(x). 

For mixed tensor we can define various contraction maps 

c{ :T*M T*:l M , j<k,i<h, 

fibre by fibre, exactly as in Chap. V.1.11. We have correspondingly for tensor 
fields the maps 

T£M - T k z\M : t C{ o t , 

which we shall also denote by the symbols C\. 

A metric tensor field is a section G of T 2 M such that for each x E Af, 
G(x) is*a metric tensor (IV.l.Ol(vii)) on TM. G is a Riemannian structure 
on M if each G(a?) is an inner product, a pseudo-Riemannian structure in the 
indefinite case. In particular, if M is a 4-manifold and the signature (IV.3.09) 
of G(x) is everywhere —2, then G is a Lorentz structure. A manifold with 
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one of these structures is called a Riemannian , pseudo-Riemannian or Lorentz 
manifold accordingly. A Lorentz manifold is often called a spacetime. The 
definitions of timelike , spacelike and null vectors (IV. 1.04) extend in the 
obvious way to tangent vectors and fields on a pseudo-Riemannian manifold. 

3.05. Definition. Taking R n as an affine space (and hence a manifold mod¬ 
elled on R n by way of the single identity chart (/#n,R n )) with difference 
function d(x,y) = y — a*, we define the standard Riemannian structure on 
R n from the standard inner product (IV. 1.03) by the equation 

G(x)(t 1 1') = (d x t) • ( d x t f ) 

(where d x is as defined in II. 1 . 02 ) for each point x 6 R n . We usually abbrevi¬ 
ate this to t • i'. Call R n with this metric tensor field Euclidean n-space and 
denote it by E n to distinguish it from its vector space which we still call R n . 

We define the standard Lorentz structure on R 4 similarly. Call the result 
Minkowski space M 4 , as distinct from its vector space L 4 (IV.1.05). Note 
the distinction. Lorentz space L 4 is a vector space with a metric tensor, 
while Minkowski space M 4 is an affine space with a (constant) metric tensor 
field. (The affine space R 4 can have a geometry given by an interesting non¬ 
constant C°° metric tensor field but a metric tensor on the vector space R 4 
is a single bilinear form.) 

These metric tensor fields are constant , in the sense of being given on the 
vector space and transferred to the tangent spaces by means of the canon¬ 
ical isomorphisms d x . Only affine spaces have such d x y s as a part of their 
structure, and hence only on affine spaces can “constant tensors” be defined 
in this absolute sense except for three trivial cases: 

(i) A tensor field of type (jj) is just a function, and can obviously be a 
constant one. 

(ii) The zero tensor field 0 of any type (J), with 0 (x) = 06 (T*M) r , 

(iii) The identity (J)-tensor field I x : T X M —► T X M , its dual, and their 
scalar multiples and tensor powers (like 3 1®I®I*) may reasonably be called 
constant, since in any chart their components are constant. 

(If M has a metric tensor field G, there is a form of constancy relative 
to G for other fields, which we define in VIII.7.10. The three cases (i) - (iii) 
above are constant relative to any G.) 

If a manifold M is embedded in R n , the standard Riemannian structure 
on R n can be applied to pairs of vectors tangent to M at any point x , and 
this obviously defines an induced Riemannian structure on M. The same 
may hold but it does not hold necessarily for pseudo-Riemannian structures. 
For the tangent space T X M may be a degenerate subspace of T x R n with the 
given metric tensor (cf. IV, 1.01 and 1.09, and Figs. 3.7 and IV.1.5.). An 
important case where it does hold appears in Chap. VIII. Exercise 1.4. 
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It is a fact that any metric tensor field, on any manifold, can be so 
induced by an embedding in some R n with a constant metric tensor, but this 
is the consequence of deep, comparatively recent, technique, not a classical 
result. It is far beyond the scope of this book. Moreover, n may need 
to be very large. For example, a Eiemannian 2-manifold may need up to 
10 dimensions for this, a spacetime may need up to 87 spacelike and 3 timelike 
dimensions, [Clarke]. (These numbers are not known to be best possible; this 
would require specific examples with proofs that no smaller flat space would 
hold them, even harder than the proof that any spacetime fits in 90 flat 
dimensions.) We shall see (X.1.08) a “flat” metric on the torus induced by a 
four-dimensional embedding: it is not hard to show that no three-dimensional 
embedding can induce it. 


Exercises VII. 3 

1. a) If 5, T are vector spaces, prove that the definitions 

(M) + ( s',t ') = (s + s',t +1') 

(s, t)a = (sa,ia) 

make the set S x T of ordered pairs (s,t) into a vector space, the 


| for s,s' e S,t,t' eT, ae R 
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product or direct sum of S and T, denoted by S © T. (The abstract 
definitions of “product” and “sum” applicable here coincide for vector 
spaces.) We often identify (s,0) with s, (0,i) with i, and hence write 
(s,t) as s + 1. 

b) Show that dim(S © T) = dim 5 + dimT, by considering bases. 

c) If X, Y are affine spaces with vector spaces S, T and difference func¬ 
tions dx y dy respectively, prove that the map 

d((x,y), ( xy')) -+ d x (x,x ') + d Y (y, \/) 

from (X x Y) x (X x Y) to S®T is a difference function making X x Y 
an affine space with vector space S © T. The affine space is called the 
product of X and Y, and denoted still by X x Y. (The ideas of “sum” 
and “product” do not coincide here, precisely there is no natural way 
of identifying x 6 X with any particular (x,y) E X x Y. The full 
treatment of these ideas is an (elementary) part of category theory.) 

d) Show that if subspaces 5, T of a vector space X have the property 
that any x E X can be written as a sum 

x = s + t , 

with s £ S, t £T unique, so that 

s +1 = s' +1 ; =$► s = s ; , t = il 

for s,s' E5, M ; ET then there is an isomorphism 
5©T —► X \ (s,t) i—► s -f* t . 

Thus for instance Exercise 1.3.8 shows that if P is a projection, 
X S P(x) © kerP. Lemma IV.2.04 is a special case of this. 

e) Show that if X and Y have the weak topology given by their affine 
space structures, the product topology on X x Y coincides with the 
weak topology given by the product affine space structure. 

f) Prove that if X, X' are affine spaces, the map 

T (XiXt) X xX-4 (T x X) © (T*,X') 

^ ((x,y) , (*',!/')) 

is an (obviously natural) vector space isomorphism. Show that if / : 
X —► Z, /' : X' —► Z* are maps between affine spaces the map 

/ x /' : X x X' — Z x Z' 

I.,.-) »(/(*),/'(*'» 
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is differentiable (respectively, C°°, C k ) at (x,x ; ) if and only if / and 
/' are differentiable (respectively, C 00 , C k ) at x and x', and that 
x f)i where it exists, is the map 

V)(xxx)suer^ 

-> — ^7x/ # (*^ # )(^ x ^0 

where the isomorphisms are as just defined. 

2. a) If Af, iV are manifolds modelled on affine spaces X, Y with atlases 

{ ( Ua<f>a ) | a E A }, { (V&, V>&) | 6 € 2? }, then if MxN has the product 
topology and we define 

0 a b ‘-U a *Vb —> X xY : (ti,v) (<£ a (u), ^b(v)) 

apply Exercise 1 to show that { (U a x | (a, 6 ) G A x B } 

is an atlas making M x N a manifold modelled on X x Y, with 
dim(M x JV) = dim M + dim N. 

b) Show that for any x £ N, the map M —► N : y »-» (y, x) is smooth. 

c) Construct a natural isomorphism T( Ptq )M x AT —► © T g AT. 

d) If M = JV = S 1 , the unit circle { (x,y) G R 2 | x 2 + y 2 = 1 }, give 
charts making S 1 a manifold modelled on the real line and construct a 
diffeomorphism from M x N to a torus. (Consider the torus obtained 
by rotating the circle { (x,y,z) G R 3 | y = 0 , (x — 2) 2 + z 2 = 1 } round 
the z-axis.) 

e) If Af, N have metric tensor fields G M , G N , show that the product 
tensor field G MxN defined by 

+*>«+ v ) = G p f ( s >«)+(*.») 

where s + t,u + v are the decompositions given by c) using the identi¬ 
fication mentioned in Exercise la), is a metric tensor field on M x JV, 
positive definite if both G M and G N are. 

3. a) Show that if Y is a topological space in which every open set is a 

union of finite intersections of members of a family V of open subsets 
of Y, and / : X —► Y is a map from a topological space X, then / is 
continuous if and only if /*”(Y) is open for every V G V. 
b) Show, by considering what it means to be a member of each set, that 

(x, y )e{(D<f> a ry{(D^r(w) 

<=> (x,y)e(D4to(D4>arr(W) 

so that the two sets are equal. 
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Fig. 3.8 


c) Find an atlas of charts on R satisfying Mi) —Miii) except that the 
^a” ° < t > b >though differentiable, are not continuously so. (Charts on R 
are just real-valued functions; try an atlas consisting of (Jr,R) and /, 
where f(x) = x(l + \x\ + a:sin A) (Fig. 3.8).) 

Show that Theorem 3.01 fails for R with this atlas, in that the 
D<t> a are not homeomorphisms and the topology that they induce on 
TR is not even Hausdorff. 

4. Show that if M is an n-manifold modelled on an affine space X with 
vector space T, 7]f M is an (n + n* +/l ) manifold modelled on X x (T*), 
using the charts constructed in 3.01 on TM to construct those on 

TiM. 

4. Components 

It is at this point that the distinction between natural operations such as 
taking tensor products, and non-natural ones involving a choice of basis, 
becomes more than a matter of style. The former can be done smoothly, all 
over the manifold at once, as we have just seen. A choice of basis often cannot. 
(This is in contrast to the situation for a single vector space, where we always 
could choose a basis, and the question was whether it helped us.) A smooth 
choice of basis for each tangent space means, obviously, a set {ti,... ,t„} of 
smooth vector fields with the property that for each x, {ii(x),... ,t n (x)} is 
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Fig. 4,1 
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a basis for T X M . In particular, each t,- must have U(x) ^ 0 for all x. Now 
for the tangent bundle of S 1 this is possible (drawn two ways in Fig. 4.1a) 
but for the Mobius strip bundle over 5 1 (Fig. 4.1b), with the same property 
of looking like the projection U x R —► U over small open sets U in S 1 , it 
is not. (This “local product structure” is the main defining property of a 
“fibre bundle” in general. Since we shall only be concerned hereafter with 
specific bundles constructed from TM we shall not go into the technicalities 
of this definition; this example, however, should be clear.) In the case of 
the tangent bundle to S 2 , the non-existence of globally non-zero vector fields 
(Fig. 4.1c) is known as the Hairy Ball Theorem. (The name is due to the 
consequence that, for S 2 embedded in R 3 , no smooth - or even continuous - 
choice of non-zero vectors t(x) in T x R 3 for each x E S 2 can have all the t(x) 
tangent to S 2 . If a hair is attached to each point x G S 2 , and we take t(x) as 
the unit vector along the hair at x, this implies that the coat of hair cannot 
be everywhere continuously combed flat to 5 2 . The result also applies to 
coconuts and dogs, insofar as they are topologically spheres.) The algebraic 
topology needed to prove the Hairy Ball Theorem is outside the scope of this 
book, but not very hard; the reader should consult Volume 1 of [Spivak]. 

A fortiori, there is no smooth choice of basis for T X S 2 at every point 
x £ S 2 . The possibility of such a choice is in fact quite a rare one (manifolds 
for which we can do it are called parallelisable ); for instance among compact 
2-manifolds it can be done only for the Klein bottle and torus, and among 
spheres only for S 1 , S 3 and S 7 . 

However, if M is modelled on R n , which we may suppose without loss 
of generality (why?), a particular chart <f> : U —► R n gives coordinate labels 
(a: 1 (u),..., ar n (tt)) to points u £ {7. Take the standard basis for each tangent 
space T^R n to be d^(£) (cf. 1.1.10), which by a minor abuse of language 
we shall also denote by £ = e \ y ... ,e n . Then <{> gives us (Fig. 4.2) an ob¬ 
vious choice of basis (D u <f>)*~(£) = (D u <f>)—ei ,..., (D u <^) 4 “e n (depending 
of course on the chart, and not defined for tangent spaces outside U) and a 
corresponding dual basis for T*M. These basis vectors have standard sym- 
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Fig. 4.2 


bols, fax more convenient than (2?„^)*~(ei) and ((U t4 ^) + “(e,))* and far more 
suggestive, which we shall now introduce. A little surprisingly the notation 
for the dual basis is the simpler to explain, and we shall do this first. 

4.01. Covariant Vectors. The dual basis S* to the standard basis £ for R n 
consists of the coordinate functions e* (cf. III.1.06) and hence the dual basis 
to (D u <j>)*~{£) consists of the composite linear maps e* o D u <j>, i = 1,..., n. 
But since the e* are linear, D^ u )(e l ) = e* (cf. Exercise 1.3e), and thus 

e* o D u </> = D^e 1 oD u <f> 

= D u {e l o <j>) by the Chain Rule (Exercise 1.6) 
= D u x { 

since <f>(u) = (ar^ti),... ,ar n (u)) means exactly that e* o <j> = x l . Strictly, we 
are interested in maps T U M —► R, not T U M TR, so in the notation of 3.02 
the i-th vector in this basis is dx* . Doing this for each i and each u£U gives 
us vector fields dx 1 ,..., dx n on U such that any covariant vector field can be 
written locally - that is, within the part U of M which the “local choice of 
coordinates” <t> applies - as a linear combination 

v = vi dx 1 H-h v n dx n = Vi dx % , 

with the Vi real valued functions. (Expressed in this way, a covariant vector 
field is often called a Pfaffian in older books.) “In coordinates”, then, 

v = (t>i,...,t>„) 

or Vi for short, with respect to the chart </>. 

If we are using, for example, (x,y, z) or (r, 0) as labels via the chart, 
instead of (a? 1 , a: 2 , a? 3 ) or (a? 1 , ar 2 ), we shall correspondingly call these basis co¬ 
vectors dx, dy , dz or dr, d6 and write t; as ( v x , v y ,v x ) or ( v r , v$), (cf. V.1.12). 
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4.02. Contravariant Vectors. If t is a tangent vector at u G M, we can 
“differentiate” a function / with respect to t by taking the directional deriva¬ 
tive D u f(t) ) (cf. 1.02). If t is one of the basis vectors (D u <j>)^(ei) ( = 6 ,* for 
short) we are interested in, then we agree as in 1.03 to denote d/( 6 ,) by 
or d{ f . For any tangent vector t we have 

D u f(t) = D u f(t%) 

= (*%)f , 

by linearity. Thus we can identify t with the linear map 


dt : / df(t) 

since the correspondence 

t*->d t 

is both linear and natural, (and injective, since dt ^ dt> for il ^ t). Having 
done so, we have the di = ^7 as a basis for T U M; by a routine check 
(Exercise 3) this is precisely the basis to which dx 1 ,... ,dx n is the dual. (As 
with the dx 1 , we have the di as fields on 17.) Clearly, the di have their indices 
rightly placed for contravariant basis vectors; whether the i in ^7 is “up” 
or “down” is debatable, but we shall regard it as “down” for the sake of the 
summation convention. 

We shall carry this identification to the point of discarding the temporary 
notation dt just introduced, and simply write t(f) for d/(i): in coordinates 

t = t% , *(/) = t%f • 


(Notice that we are not writing di as Oj, though it is a vector field 
on 17. The reason is that we are looking at it not as a vector-valued map 
U -+ TU but as a function-valued map on functions, taking / to dif \ thus 
some inconsistency in notation is inevitable. This is at least consistent with 
writing dx * not dx * in 4.01, which follows from the use of df not df in 3.02.) 

As in 4.01, for “named” rather than “numbered” coordinates we shall 


write, for instance d x , d y , d z 

We have encountered here another of the rig! 


- / y> w z (Or 


111 e tc) 
dx> dz ’ 

mt 


ways to construct TM : 


Each “9t” is linear and has the property that 


dt(fg) = (dtf)g + f(d t g) 

(Exercise 4). Also, any map 6 from the space of smooth real-valued functions 
on M to itself that is linear and has 

(*(/*))(*) = + f{x)Sg{x) 
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for each point x € Af, called a derivation on M, turns out - though we shall 
not prove this - to have 

(«/)* = fy*)/ 

for some unique vector field t. Given an object corresponding, linearly and 
naturally, to the collection of vector fields, it is clearly possible to reconstruct 
the tangent bundle. (This particular construction only works properly for M 
strictly smooth, i.e. C°° not just C k for some large Jfc. The difficulty is that 
differentiating C°° functions gives C°° functions, but C k only C*” 1 . This is 
why we omit the technical details of this approach, and avoid proofs based 
on it.) 

4.03. Tensors of Higher Degree. The basis for ( T k M) u induced by a chart 
(17, <£), where </>(u) = (« 1 (u),..., x n (u)), is exactly the basis constructed from 
dif.fdn and its dual as in V.1.12: Thus the basis for, say, (T 3 M) u is the 
set of all n 5 vectors at u of the form 


di <g> dj ® dk ® dx l <g> dx m 

where {i, j, k , /, m} C {1,..., n}. Doing this for each u, we have n 5 fields. 

We follow convention in abbreviating a tensor field, given as a linear 
combination 

w = (di <S> dj ® dk ® dx l ® dx m ) 

of these basis tensor fields, to For “named” rather than “numbered” 

coordinates, (a?,y, z) not (a: 1 ,^ 2 ,^ 3 ), write w** 2 for u^ 23 (and do not apply 
the summation convention). 

4.04. Transformation Formulae. Recall that a tangent vector t E T U M 
was formally constructed (Exercise 2.3) as an equivalence class of vectors 
representing it via charts. It follows that the components t * of t in the basis 
di ,...,9 n induced by the chart (17,^), with <j>(u) = (x 1 ,... ,a: n ), are ex¬ 
actly those of its representative D<j>(t) £ T^( u )R n in the standard basis e for 
T^ (ti )R n = R n . This is because di,...,d n is exactly (D u <j>)*~(£) (cf. Exer¬ 
cise 2.6b). 

If therefore (17',^) is another chart, with <f> f (u) = (x' 1 ,..., x /n ), and 
t n are the components of t with respect to the basis d[, ... ,c% induced by 
(U f we know by Exercise 2.4 that at any u£UnU\ 


(i) 





This is the transformation formula (or rw/e, or law) for contravariant 
vectors and vector fields. 

For covariant vector fields, represented in the dual basis, we there¬ 
fore know the formula by III. 1.07 once we know the inverse of the matrix 
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j. But by 1.03 this matrix is just the Jacobian of <j>* o which is 
just the matrix of o Therefore its inverse is the Jacobian 

of ($ o^~)«~ = <j) o which is 


( 2 ) 


dx 3 
dx n 


v { = Vi 


. Hence the formula we want is 


dx* 


dx fi 


(One of the more baffling things last century, when people tried to look 
at as the ratio of two infinitesimals” dx* and dx n , was the way ^7 

is not one over ^ 77 ; it is the Jacobians as whole matrices that are inverse 
to each other. This means that for instance the chain rule (Exercise 1.6), 


dx ni 6 x ni dx'* 
dx k dx 1 * dx k 


in components, 


is not simple cancellation it formally resembles if you do not realise the 
summation it involves. The room for confusion here is immense - and 
was fully taken up; it is greatly reduced by starting from the coordinate- 
free view point and finding components as needed. 

Notice that in a “change of variables” from (x*)^ to (x'^’s you 
are given the (x'*)’s in terms of the (x*)^. This means that for each 
(x ,x ,... ,x' n ) you are told the corresponding (x 1 ,... ,x n ). That is, you 
have the formula for $ not <t> f o <j>*~. Differentiating it gives directly 
what is needed for formula ( 2 ), while to apply formula ( 1 ) you need to 
invert the Jacobian at each point. This is even messier than with just 
one matrix, at one point, to invert, and it was in this context that the 
words “covariant” and “contravariant” were chosen the way they were, 
(cf. III.1.07). 

Combining (1) and (2) with the discussion of V. 1.12, we have immedi¬ 
ately the transformation formulae for tensors of all types. For example if w 
is a tensor field of type ( 2 ), 


ah dx fi> dx'*' dx' k ' dx 1 dx m 

J'm' Im Q x t Q x j Q x k Q x tV Q x ,m> ’ 

= wj% d i (x' i )dj(x'i)d k (x' t )d;,(x l )du* m ) 


and so forth for other types (Exercise 5): the old definition of (^-tensor 
fields. 

Sometimes is written as , but this is as logically peculiar 

as writing [a^] for the inverse of [aj] (1.2.08) and for the same reason: what 


is the difference between “w)i*J? with i' = 1 , j f = 2 , k‘ = 1 , /' = 3, m' = 2 ” 
and with i = 1, j = 2, k = 1 , / = 3, m = 2”? 
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4.05. Raising and Lowering Indices. If M has a metric tensor field G?, 
then this defines (Gjp)i • T X M —► T*M and its inverse (G r )j for each x, 
and maps to "raise and lower indices” (V.1.13) for tensors of higher order 
at x . These glue together to give maps that alter the variance of tensor fields 
(Exercise 6 ). 

The results are altered even more drastically by a change of metric tensor 
field G than a change of metric tensor alters things in the linear case. For 
example, two different Riemannian matrics can take the same (J)-tensor field 
to contravariant fields for which the flows (§ 6 ) are crucially different. So be 
wary above all of using one metric tensor to raise indices, and another to 
lower them (or vice versa). Chalk might become cheese. 


Exercises VII.4 

1. Let M be a smooth manifold and ( <f >, U) a chart on M with <p(u) = 

(^(u),..., *"(«)). 

a) Show that each dx 1 is a smooth field. 

b) Prove from this that a covariant vector field v, where v = V{dx* on 
{/, is C k on U if and only if each «,• : U —► R is C k on U. 

2. If a cotangent vector v u at a point u € M has v u = Vi dx'(u) (notice 
that the are in this case just numbers, not maps) show that v u is 
the derivative at u of the real-valued function 

ViX * : M —► R : u vi(x 1 (u)) H-h v n (« n (u)) . 

(This does not imply that a vector field v with v(u) = v u need 
be the differential of the function VjX*, or of any other.) 

3. Show that if t £ T U M , dx % (t) = dt{x l ) in the notation of 4.02. Deduce 
that, writing t = Pdj 

dx^Pdj) = f 

and in particular 

dx i (d j ) = s i j 

so that dx 1 ,dx n and di,... ,d n are dual bases to each other. 

4. Deduce from Exercise 1.5b that for any t £ T U M, and real-valued 
functions /, g on M with their product fg, defined as usual by 
(/$)(«) = /(«)</(«) for each u, 

(d(fg))t={df(t))g(u) + f(u)(dg(t)) . 

Deduce that for t a vector field , we have the Leibniz Rule 

{d(f9))t = (df(t))g + f(dg(t)) . 
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or in the alternative notation of 4.02 

*(/</) = *(/)</ + ft(a) ■ 

5. Write down the transformation formulae for tensor and tensor fields of 
types ( 2 ), ( 2 ) (notice that these latter include the metric tensor fields), 
and (V 2 ). 

6 . Define Gj : TM -+ T*M (equivalently, G| : M —* T{M) and Gj : 

T*M - TM by = (G,)*, G r | T;M = (G,) t (so that (GJ, = 

(G«)i, etc.) 

a) Prove that Gj, Gj are diffeomorphisms. 

b) Write down the coordinate formulae for raising and lowering various 
indices (take your pick, but specify your choice) in tensor fields of type 
( 3 i 1 2 ), using the metric G. 

7 . Show that if <j ): U —► R n is a chart, the map 

t£u -»R"+ n< ‘ + ‘ ) , 

taking a tensor at x to the n coordinates of x and its own n( k + h ) 
components, is exactly the chart D k <j> constructed in Exercise 3.4. 


5. Curves 

5.01. Definition. A curve or path in a manifold or affine space M is a 
(smooth unless otherwise stated) map c : J —► Af, where J is an interval 
in the real line. The interval may be open or closed, finite or infinite, at 
either end. If J is [a, 6], for some a < h £ R, c is a curve from c(a) to c(6). 



Oix*. 7^aiAe##ia£liia L PAyAtcJ. 



190 


VII. Differentiation and Manifolds 


If for all choices of x, y € M there is a curve from x to y, M is path - 
connected. (R\ {0} for instance is not path-connected as by the Intermediate 
Value Theorem there is no path from -1 to +1.) We shall include “path- 
connected”, like “smooth”, in our concept of a manifold, unless otherwise 
stated. 

Notice that 5.01 is not the notion of “curve” used in elementary geome¬ 
try; that refers rather to a set in M, such as the parabola { (x, y) | x 2 = y } C 
R 2 . The two curves / : R —► R 2 : t »-► (f,f 2 ) and g :t h* (2t,4t 2 ) both have 
this set as image, but are different as maps and therefore as curves, in our 
sense. In this instance, however, we can “give one in terms of the other” : if 
h : R —► R : t »-► 2t, g = / o h. This leads to 

5.02. Definition. If for two curves / : J —► M, g : J' —* M there is a 
continuous (respectively smooth, affine) bijection J —► J' such that / = goh, 
then / is a continuous (respectively smooth, affine) reparametrisation of g. 
In the special affine case where h(t) = t -f m, m 6 R, we shall call / a constant 
reparametrisation of g. 

Two curves need not, however, be reparametrisations of each other even 
if both injective and with the same image set. For example, consider /,</ : 
[0,1[ —► R 2 with f(t) = (sin27rf,cos27rf), g(t) = (cos 27 rf,sin 27 rf). 

“Curve” does not imply “not straight”, even when “straight” is defined 
in M (which it is not, for a general manifold): an affine map R —► X for 
X an affine space, for instance, satisfies Definition 5.01. Remember that a 
mathematical term for which a definition is given means exactly, and only, 
what it is defined to mean, independently of ordinary language. 

We shall generally, as above, use t as the “parameter” (name for a typ¬ 
ical point in the domain) of a curve. This is suggested by the notion of a 
curve c as specifying a motion through the manifold, with position c(t) at 
time t. Sometimes we want to avoid this suggestion of time involvement; 
when convenient for this or other reasons we generally replace t by s. 

The discussion of maps from T to an affine space in 1.03 was purely 
local, and hence applies equally to curves in M. If we think of t as “time”, 
the vector f*(t) = D t f( 1) introduced there emerges naturally as a “velocity 
vector”. (If this is not transparent, think about writing in coordinates the 
velocity of a particle moving in R n .) In general we shall call it the tangent 
vector to the curve f at t: not “at /(f)”, as we might have f(t) = /(t'), 
but f*(t) ^ if / crosses itself (Fig. 5.2). This would give two tangent 

vectors “at /(f)”. 

Thus far we have a “velocity” but not a “speed”: a non-zero vector in a 
general vector space V has no “size” except in comparison to others in the 
same direction, unless V has a metric tensor. Such a tensor for each tangent 
space, in which the tangent vectors f*(t) are located, is given by a metric 
tensor field G, say, on M. 
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If G is positive definite (so that M with G is a Riemannian manifold), 

f*(t) • f*(t) is naturally to be thought of as (length of /*(t)) 2 . This leads us 
to the idea of 

b 

j >//*(<)•/•(*)* = *, 

a 

say, as the length of the whole curve / : [a, 6] —► M. (In Euclidean space, 
where one has already a notion of ‘length” for straight curves, one can show 
that this integral coincides with the limit obtained by approximating / ever 
more finely by polygonal curves; cf. also Exercise 5.) If we define 

k 

s(k) = J ■ f*(t)dt 

a 

then s(k) is the length of /|[ a> *] and s : [a, 6] —► [0,/] is a smooth surjective 
map. If s has a smooth inverse h (which always holds when f*(t) is not a 
zero vector for any t, by Theorem 1.04) then g = f oh : [0,/] —► M is a 
smooth reparametrisation of /, with 

(length of </|[o,*]) = k . 

Such a curve g is parametrized by arc length. 

The length of / is infinite in the “negative direction” if the lengths 
of f[a t h] increase without bound as a takes lower values in J. (Note that by 
Exercise 5c an open interval of finite length, like ]—1,1[, can have a continuous 
image of infinite length.) Then we have to choose some x E J as s*~(0) and 
allow negative k. We call such a curve parametrised by arc length if the 
length of g\[t t t'] is t 1 — t for any f, t* in its domain. For any curve of finite 
length, then, we have a unique reparametrisation by arc length, while that for 
a negatively infinite one is unique only up to a constant reparametrisation. 

These concerns explain the classical notation used to specify a metric 
tensor field in coordinates: The length of an arbitrary curve is denoted by s, as 
above. The curve is written as (x l , x 2 ,..., x n ), with these x % being functions 
of a suppressed argument t, and ..., ^ are as in 1.03. Then 
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/ ds \ 2 dx % dx? 

* [it) ~ 9ij ir~dr 

with the argument of (x 1 ^),..., x n (t)) also suppressed. A typical example 
would be written out as 

** cfs 2 = (dx 1 ) 2 + \dx^ dx 2 + (dx 2 ) 2 + ((x 1 ) 2 + l)(dx 3 ) 2 , 

giving the gy explicitely and “multiplying out” the dV s. The ds on the 
left was interpreted as the length of an infinitesimal piece of the curve, and 
called a “fine element”; the dx* were “infinitesimal displacements” in the 
“infinitesimal time interval” dt. Of course jfc was no t defined as a ratio of 
infinitesimals, but as a limit, and until recently infinitesimals were not objects 
you could safely do algebra with. (For if there is just one “infinity”, oo, 
^ = oo = for any a, b non-zero real numbers, so by the ordinary rules 
of algebra ^ = f for any a, 6. It is not trivial to erect a consistent theory 
of infinitesimals.) This is a good instance of physics’ usage holding onto a 
highly formal and manipulative - and thus abstract - approach, long after 
mathematics had developed a language that was geometric, visualisable and 
essentially concrete. 

Sectarian jibes apart, though, it is clear that ** above is sufficient to 
specify *, and gives G/(<) (/*(<),/*(<)) for any / and t . Since any tangent 
vector can arise as an f*(t) (a point we elaborate in VUI.jjl) we have G x (v, v) 
for any x 6 M, v £ T X M , and hence G x (u,v) for ti,v € T X M by the 
polarisation identity (Exercise IV.1.7d). Thus specifying the “line element” 
ds 2 gives the metric tensor, in a single equation rather than a separate formula 
for each gij . If the chart used makes the di everywhere orthogonal (though 
they cannot in general be ortho normal, as we see in Chap. X) it is much the 
most succinct way to write down a particular metric tensor in coordinates, 

and we shall use it freely. _ 

If G is not positive definite, the “length” y/G x (v\v) for a vector at x 
is not a very practical quantity (cf. IV.1.04) and we do better to use || ||g x 
(IV. 1.06). Even with this we do well to restrict the kinds of curve we examine. 

5.03. Definition. A curve f : J M in a pseudo-Riemannian manifold is 

(i) timelike if G(f*(t),f*(t)) > 0, Vf G J 

(ii) null, or lightlike if G(/*(t),/*(t)) = 0, Vt G J 

(iii) spacelike if G(f*(t),f*(t)) < 0, Vf G J 

(iv) like if timelike, spacelike, or null. 

We shall generally be interested in the length only of like curves. 
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5.04. Definition. The length of a curve / : J —► M is 

£(/)=/ nrwiio,,,,* 

j 

which may - though need not - be infinite if J is not compact (Exercise 5c). 

Since the definition of a like curve requires that f*(t) be never zero, we 
may extend the discussion above to get reparametrisations of such curves 
by arc length. The length of a null curve, however, is automatically zero; 
since [0,0] is just a point, we cannot therefore parametrise a null curve by 
arc length. In Chapters XI and XII we shall use a to denote arc length for 
timelike curves. 

5.05. Note. We have defined differentiation rigorously, but not integration. 
We cannot treat integration in general without introducing differential forms, 
which we do not cover in this volume, but for our uses of it in one dimension 
the following will suffice (“integral as anti-differential”). 

An indefinite integral of a function / : J —► R is a differentiable function 
g : J —► R such that 

^(<) = /(<). 

If / has such an indefinite integral, the definite integral 

b 

J f(t)dt 

a 

of / from a to 6 (a, 6 £ R) is defined as (</(&) - g(a)) (cf. Exercise 2). If J is 
a closed interval [a, 6] we write also 

b 

j mdt= J mat . 

J a 

If J is a non-closed interval (say, R or [0,1[), we choose a decreasing sequence 
a n and an increasing sequence b n such that every x £ J is in [a n , b n ] for some 
n (say, a n = —n, b n = +n for R, or a n = 0, 6 n = 1 — £ for [0, IQ. Then 
the definite integral of / over J is defined as lim n _,oo(flf(6 n ) — g(a n )) if this 
limit exists and is the same for all choices of sequences a n , b n (cf Exercise 3). 
Otherwise, the integral is non-existent, or divergent If for any x G R, however 
large, there is a subinterval J 1 of J such that for any closed subinterval [a, 6] 
containing J 1 we have 

b 

J f(t) dt > x 

a 

then the integral is positively infinite and similarly for negatively infinite . 
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Fig. 5.3 


If / has an indefinite integral we say / is integrable . 

The proof of the existence by these definitions of the integral we have 
called “length” can be direct only when the image /( J) of a curve is a subset 
of an affine straight line (Exercise 5). The reason is the essential triviality 
of defining integration as the reverse of differentiation. Apart from applying 
only to continuous functions, it does not say anything about the way an 
integral is a glorified sum. (The very symbol f is just an olde Engli/he 
“s” for “sum”). The integral of a function is more fundamentally the “area 
under the curve” (Fig. 5.3a) defined as a limit of approximating sums of 
rectangular areas. The length of a curve is the limit of the sums of the 
lengths of the straight bits in a polygonal approximation (Fig. 5.3b), and so 
on. Then the fact that the integral exists, and is an “anti-differential” when 
/ is continuous, says something significant, and requires work to prove. Any 
Maths student reading this has a proof among his first Analysis lecture notes. 
Physics students with the mathematician’s hunger for a proof are referred to 
any reliable introductory Real Analysis text, for example [Moss and Roberts]. 


Exercises VII. 5 

1. a) If / : [a, b] -*• R is C 1 at t G ]a,i>[, and /(s) < f(t) for all s G [a, 6], 
show that f*(t) = 0. (If not, D t f is injective. Apply 1.05 to contradict 
the assumed maximum for / at t.) 

b) Deduce that /* similarly is 0 at a differentiable maximum, strict (so 
/(s) > /(<) Vs / t), or otherwise. 

c) If / : [a, 6] —► R is C 1 and f(a) = f(b) = k show that / has a maximum 
or a minimum at some t G ]a, 6[. (If /([a, 6]) = {fc}, set t = |(a + 6); 
if not use VI.4.11.) 

d) Deduce from a)-c) that if / : [a, b] —► R is C then there exists t G ]a, b[ 
with Dft = 0. 

e) (Not used in the book.) Extend d) to the case that / is differentiable 
but not C 1 . (Replace Theorem 1.05 which fails, by a proof that an 
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injective derivative at t for / : R —► R implies that / takes values on 
both sides of /(<).) 

2. a) Deduce from lc) that if / : [a, 6] —► R is C 1 then /(a) ^ f(b) => 
/*(t) ^ 0 for some t £ ]a, b[. 

b) Deduce that if f*(t) = 0 for all t £ ]a, 6[, then / is constant. 

c) Deduce that if /, g are two indefinite integrals for a continuous func¬ 
tion h : J —► R, J any interval, then g(t) = h(t) + m Vt, where m is 
some real constant. 

d) Deduce that any indefinite integral for h gives the same definite inte¬ 
gral from any a to any 6, if h is defined everywhere in [a, 6]. 

e) If h(t) = -js, g(t) = j, f(t) = ! + jfj, then 

d£_, 

dtdt 


3. 


4. 


whenever /, g and h are defined. (Thus two “anti-differentials” need 
not in general differ everywhere by the same constant. The rule “one 
integration, one constant” is valid only if the domain of definition of 
the functions involved is path-connected, cf. 5.01.) 

Assuming that sin is an indefinite integral for cos on R, show that 
cos has no integral over all R by producing sequences a n —► — oo, 
b n —► +oo such that 

(i) (sin(6„) -sin(a n )) = k Vn, for any given constant k £ [—2,2], or 

(ii) lim n -_ 00 (sin(6 n ) — sin(a n )) does not exist. 


If / : J —> R is integrable and a, 6, k £ J, show that 

b k k 

J f(t) dt + J f(t) dt = J f(t) dt . 


5. a) Show that if / : J —* R is C l and injective, the length of / using Defini¬ 
tion 5.04 with the usual Riemannian metric on R is exactly the length 
of the set /(J), defined in the usual way. (Just combine definitions.) 

b) Deduce that if / is a smooth injective curve in Euclidean space whose 
image is a subset of a straight line, then even if / is not affine the 
length defined in 5.04 coincides with the usual length of the set /( J). 

c) The curve / : ]—1,1[ —► R : t *-♦ has infinite length, and g : 

R R : < h has length 2. (Since g is not injective, apply 
Exercise 4.) 


6. Vector Fields and Flows 


6.01. Examples. Given a tangent vector at each point in a region U, it is 
natural to try to “join the arrows up”: fill U with curves, so that at every 
point x the curve through x is in the direction pointed by the vector at x. 
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Familiar examples are the ‘lines of force” defined by a magnetic field (the 
vector field being defined at each point by the effect on a hypothetical “free 
north pole”), and the “stream lines” defined by the velocity vector field of 
a moving fluid. In steady flow, the stream lines are realised physically as 
the paths followed by particles in the fluid. Moreover if we express such a 
movement by a curve c in U with c(f) = position at time t , the velocity at 
the point c(t) is exactly the tangent vector c*(f), not merely in the same 
direction. Can we produce such a set of curves for an arbitrary vector field? 

First, let us consider some examples. 

On R, if we have a vector field 

. W= (^ +r + -- ; ^ i + 4it . )e lW 

= t;(a:)ei(x) for short, 

the only curve c with c*(f) = v(c(f)) is 

Y~j5 > 

up to a constant change in parameter or restriction to a small domain (Ex¬ 
ercise la). There is no way to extend c to a continuous map with domain all 
of R. 

So in general we cannot expect to do better than find a curve c : 
]— e,e[ —► U with c(0) a given x £ Af, c*(t) = v(c(t)) for all t € ]—e,e[, 
for some e. 

Moreover, we cannot expect to use the same e for the curves through all 
the different points in Af. Let Af be R 2 and put (cf. 4.01, 4.02 for notation) 

w(x, y) = *>((1 + y 2 ) x )d* + Ody 

with the function v as before. Then the unique curve c through (0,yo) with 
c*(t) = v(c(f)), (again up to a constant or restriction) is, by Exercise lb, 


c : ]—e,e[ —► R 2 : t »-* 


( e 2_*2’i' 0 ) > 


where £ = 


i+y ( 


Thus no one e > 0 will do for all points, since for any given choice a 
large enough j/o requires a smaller one. The best we can expect in general is 
a result local both in R (curves with limited domain) and in M (the choice 
of e depending on where in M we are). 

The reader should have noticed by now that we are talking about solu¬ 
tions of differential equations. With the field v on R 2 above, for example, the 
equation c*(t) = v(c(f)) is equivalent to 
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dc 1 

dt 


= « , 


dc 2 
dt 


= 0 


in the notation of 1.03. Hence we use the language:- 


6.02. Definition. A solution curve , or integral curve of a vector field v on 
a manifold M or on a region U in M is a curve c : J —► M such that 
c*(t) = v(c(t)), Vt E J. 

A /irsf order differential equation on M is a vector field on M. 

Thus differential equations are as essentially geometric as linear algebra, 
though from many treatments you would guess it for neither. An excellent, 
highly pictorial (and cheap) introduction to the geometric point of view on 
differential equations is [Schwarzenberger (1)], based on a first-year under¬ 
graduate course. 

“Solving a differential equation for given initial conditions x* = x{, at 
time t = 0” now translates exactly into finding a solution curve c of a vector 
field v, with c(0) the point xo labelled (xj,... ,x[}) by the chart being used. 
In any real calculation we do not know xo exactly, so exact solutions for time 
t are worthless unless c(t) depends continuously on the xo through which 
the curve c is required to pass (compare VI§1). Very conveniently, if v is 
continuously differentiable, it always does. First we define a flow. 

6.03. Definition. A C k local flow for a vector field v on M is a C k map 


<t> :U x ]-e,e[ —► M 

where U is an open set in M and e a positive real number, such that 

(i) ^(y,o) = y,v y ec/. 

(ii) For any y G 17 , if we set c(t) = <f>(y,t) for t E ]—£,£[ then c : 
]—£, e[ —* M is a solution curve of v . 

The local flow is on ( 7 , and is around any x E U. We now have the 
language to state 

6.04. Theorem. If v is a C k vector field on a manifold M, there is such a 
C k local flow for v around every x E M, which is unique in the following 
sense: 

(i) If <f> : U* x ]—s\e f [ —► M is another local flow for v f then setting 
e" = min^e') and U" = U fl U 7 we have 

< t>\u n x]-e n } € n [ = ^\u n x]-e n ,€ n [ • 

So if and <f f agree where they are both defined. 

(ii) If f is a solution curve with f(t) = x, then f(s) = <£(x, s — t) 
whenever both sides are defined. (Thus there is essentially just one solution 
curve through x, up to constant reparametrisation. This need not be true if 
v is merely continuous: cf. Exercise 2.) 
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We give a recent, simple geometric proof of this theorem in the Appendix. 
Our use of the result depends only on what it says, not how it is proved, so 
the reader will miss nothing essential to the rest of the book by taking the 
theorem on trust or from a proof already encountered - though he will miss 
a nice proof. □ 

6.05. Corollary. Let <j> : U x J —► M be a C k local flow for v. If for t 6 J 
we define 

</>t :U —+ M : x i-> 

then <f>t+, = <t>t o <}> s whenever t f s and t + s are all in J. 

Proof Let f(t) = <t>t(<f>,(xj) = <£(<£, (z),f). Then / is a solution curve of v 
and hence by the theorem a constant reparametrisation of the solution curve 
g defined by g(r) = </>(x, r). Since /(0) = </($), we must therefore have 

/(*) = 9(t + s) , 

and hence 

(4>t 0 = f(t ) 

= g(t + s) 

= <t>t+t{x) . 

6.06. Corollary. Each set is an open set in M, and, giving U and 

<t>t{U) the differential structure restricted from M, each map 

is a diffeomorphism. 

Proof. By Exercise 3.2b each map %% : U —► U x J : x (x,t) is smooth, so 
the composites </> t = <j>oi t are C k . 

By 6.05 <l>t o <t>^t{x) = <j> o(z) = x, so each <j> t has the C k inverse <j>^ t 
(modulo minor technicalities about domains of definition). The result follows. 

□ 

We shall confine our attention largely to local flows where v is non-zero 
on U, as we shall not be needing results on the behaviour of flows around 
zeros of v. (Some samples of the latter are shown in Fig. 6.1; the study of 
these is a large part of the theory of dynamical systems.) For such non-zero 
fields we have the following “straightening out” locally for a flow. 

6.07. Lemma. Let M be a manifold on an affine space X with vector space 
T. 

Ifv is a smooth vector field on M and x £ M has v(x) / 0, then there 
is a local flow </> : U x J —> M for v around x f a chart t/> : U —► X, and a 
vector w£T, such that 

H < Ky> t )) = V’^Cy.o)) +tw 
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Fig. 6.2 


whenever both sides are defined. (Thus the flow looks locally like a family of 
translations in the direction of w, by the chart.) 

Proof. By continuity, x has a neighbourhood V\ on which v is non-zero; 
by 6.04 v has a local flow 0 : Vi x J around x; by the definition of a manifold 
there is a chart a : V 3 —► X. Let a(x) = y, Da(y(x)) = w. Choose a 
linear functional / : T —► R with f(d a ^(w)) > 0; by continuity x has 
a neighbourhood V 4 C V 3 with /(d a ( y )(JDa(v(x)))) for all z E V 4 . Let 
V\ fl V 2 fl V 4 = V , and denote the restrictions of 0 to V x J and a to V 
likewise by 0 and a. We now have the situation of Fig. 6.2 where all vectors 
v(z) ET z M with a(z) in the affine hyperplane K = y + ker / are carried to 
vectors Da(v(z)) pointing across K in the same way. Thus no solution curve 
in V crosses a*~(K) more than once. So if we set U = V fl <j>{a*~{K) x J ) 
and 

fl : (a(U) nK)xJ-+M:(k,t)* ^ («-(*)) 

we have a local inverse 7 : U K x J (that is, y(r)(k y tj) = ( k f t ) when 
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defined) with 7 | a .-(jc) = a| a -(jf)- Since rj is evidently smooth and Dr) 
always invertible, 7 is also smooth. Define 6(k,t) = k + tw and ip = 6 oy. 
Then <p\u*j, V* and w satisfy the conditions above. □ 


Exercises VII.6 

1. a) Show that if c(t) = 7 ^ 5 , then c*(<) = v (c(<)), where v is the vector 

field on R introduced in 6.01. 

b) Show that the curves in R 2 given in 6.01 are solutions of c*(t) = 
u>(c(t)). 

2. If the vector field v on R has t>(x) = x 3 ei(ar), show that v is not 
differentiable and that c(t) = and c(t) = 0 are both solution 
curves for t> through 0. 


7. Lie Brackets 

By 6.07 we can make one vector field look locally like translation, where 
it is non-zero. It will be important later to know when we can do it for 
several vector fields at once. Evidently this need not be true in general, since 
translations always commute ((x + i) + s = (x + s) + <), while there is no 
reason for flows to. (For instance if M is R, <p((x + y),t ) = (x + ty, y) and 
^((*,2/).*) = (x,y + t) we have ^iV>i(0,0) = (0,1) while ipi<pi(0,0) = (1,1). 
To what vector fields do ^ and ^ correspond?) It turns out that there is a 
purely local condition on the vector fields which decides the question for the 
flows. 

We have mentioned, in 4.02, the view of a vector field v as a derivation 
on functions: (v(f))(x) = df(v(x)). If we have two vector fields v, w we 
may or may not have v(w(f)) = w(v(f)) for all functions /. (Consider the 
vector fields of the flows <f >, above, and the function f(x, y) = x + y.) It 
turns out that we do have this property exactly when the corresponding flows 
commute. 

7.01. Definition. The Lie bracket or commutator of two vector fields v, w 
on a manifold M is the unique vector field, denoted [v,tt>], such that 

[»,«>](/) = v(w(f)) - w(v (/)) , 

for all smooth / : M —► R. 

If we had shown that every derivation corresponds to a unique vector 
field, this would guarantee the existence and uniqueness of [v,in], since 

»M/tf)) = w(v(fg)) = v(gw(f) + fw(g)) - w(gv(f) + fv(g)) 
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= v(g)w(f) + gv(w(f)) + v(f)w(g) + fv(w(g)) 

~ w(g)v(f) - gw(v(f)) - w(f)v(g) - fw(v(g)) 
= g{v(w(f)) - w(i >(/))) + f(v(w(g)) - w(i >(</))) 

so we have a new derivation. As it is, the most direct method is to use 
coordinates (Exercise 2). (Note that / v(w(f)) does not generally give 

a derivation - consider, say, the examples above on R 2 - and so cannot 
correspond to a vector field. It is very special that / v(w(f)) — w («(/)) 
does.) 

7.02. Theorem. Let v, w be vector fields defined in a region U of a manifold 
M f with local flows <j>, onU forv and w. Then <j> and satisfy 

fa o fa = fa o fa 

wherever defined if and only if 

[«,*»]* = 0 

for all x E U. 

Proof. If fa o <j> 8 xj; s o fa everywhere, for any function f on U then 

«(*»(/)) -1» ok/)) 




= lim 

(5,t)-^(0,0) 


f°i>sofa-fofa-forl; 8 +f 

-fofaofa+fofa + fofa- 

st 


= 0 . 

Conversely, if at x G M both v and w are zero, then fa(x) = fa(x) = x, 
Vs,J, so the result is trivial. If, say, v(x) / 0 we have by continuity a 
neighbourhood of x in which v is non-zero and, applying 6.07 in coordinate 
form, a chart 9 in which v = d\ } fa(x l ,... , x n ) = (x 1 + t,... ,x n ). If [v,to] = 
0, then 

0 = v'diU? — w'd{V J , Vj, 
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Fig* 7.1 


by Exercise 2a. That is, 0 = diw*, V;, so the tv* are constant in the 
a^-direction. Hence for any solution curve c = for w> c = 

(c 1 + t,c 2 ,... ,c n ) is also a solution curve where it lies in the range of the 
chart 6, Hence tp s (x + te i) = tp s (x) + te i, i.e. xp 8 o <p t = <p t o xp 8 . □ 


7.03. Language. The equation [v, w] = 0 is thus the “infinitesimal version” 
of ^o^i, = ip $ 0 fa- We say that v and w commute in a region U when their 
Lie bracket vanishes there. A larger set of vector fields is said to commute if 
any two of them do. 

can in fact be defined as the infinitesimal failure of <p and tp to 
commute, just as ^ is the infinitesimal failure of / to be constant. In an 
affine space this takes the form 


[v,ti>] r = 


flim 

h 2 


which clearly vanishes if <pt o ip 8 = ip 8 o <p t always, since this implies 
ip-. s <p- t ip s <pt = ip- 8 ip 8 <p-t<pt = I where defined. In a general manifold the 
equivalent definition is a little more complicated. 

We shall omit the proof that this definition is equivalent to 7.01, as we 
shall not need to use it (indeed, we have yet to see a use for it except as 
motivation.) 

7.02 gives us a result that will be crucial when we come to decide which 
spaces are intrinsically “curved” in Chapter X. 

7.04. Theorem. Suppose around a point x in an n-manifold M we have n 
vector fields t>i,..., t i n such that for all y in a neighbourhood U of x we have 
vi(y),.. ., v n (y) linearly independent and [vi,Vj] y = 0 Vi,j = l,...,n. 

Then there is a chart ip : U —>► R n around x with respect to which v,* = 
i = 1,..., n and the corresponding flows <p l ,..., <p n have 


<t>\(x 1 ,...,x n ) = (x 1 ,... > x i + t,...,x n ) 


Vi,t. 
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Proof. If (f ) 1 ,..., <f> n are defined on Vi x J,-, i = 1,..., n let 

?) = <&&■"$•>(*) 

where defined, and denote its domain of definition by J C Ji x • • • x J„ C R n . 
Evidently J is open, and 0 is smooth by the smoothness of the <j> 1 . Now DO 
takes the vector ej^ 1 ,. to c*(0) at 0 {i l ,... where the set 

C(s) = ^(1 • • • + , • • • ^"n(x) 

= #($» ■ftv ' ’' by 6.50 and 7.02 

so c is a solution curve for iu through 0 (t l ,... ,t n ), and hence c*(0) = 
w, (^(c(0))) 

In particular, DO takes the standard basis ei,...,e n for T( 0> ... f o)R n to 
((vi)ar, •.., (v n )r) • That is a linearly independent subset of the n-dimensional 
space T X M } by assumption, so JD0(o,...,o) is an isomorphism. Hence by the 
Inverse Function Theorem (1.04) there is a neighbourhood U of x and a local 
diffeomorphism ip ::U —► R n with ipoO = which can be used as a chart. 
With respect to this chart, v, = d,*, i = 1,..., n as required. 

Notice that by Exercise 2c this theorem gives a necessary as well as 
sufficient condition for v\ ,..., v n to have a realisation as d\ ,..., d n in some 
chart. □ 


Exercises VII. 7 


1. a) Show that if for / : R n —► R the partial derivatives d%f> djf and 
di(djf) exist and are continuous, then so does dj(dif) and it is equal 
to di(djf). Hint: show that both are equal to 


lim 

(M)-(o.o) 


^/(x 1 ,...,^* + h, ...,2^ + fc,...,x n )\ 
-/(x 1 ,...,** l + h,.. .,x',...,x") 
-f(x 1 ,...,x\...,rf + k,...,x n ) 
+/(x 1 ,...,x , ,...,xJ,...,x") / 

hk 


This is known as the equality of second mixed partials (cf. also Exer¬ 
cise X.2.1). 

b) Show that the continuity conditions above cannot be dropped, by 
proving that if /(x, y) = xy(x 2 - y 2 )/(x 2 + y 2 ) then dif, d 2 /, didi f 
and did\f all exist, but that (5ic> 2 /)( 0 ,0) ^ (^ 2 ^i/)( 0 ,0). 

c) Show that existence and continuity of the dip for / : R” —► R m imply 
existence and continuity of Df . (Show that the linear map defined by 
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the Jacobian matrix in 1.03 satisfies the definition of Df , if the d%f* 
are continuous.) 

2. a) Use Exercise la to show that if two vector fields v y w on M have 
v = w = w l di with respect to some chart U —► R n then the 
vector field u = (v'ditv* — w'diV*)dj has u(f) = v(w(f)) — w(v(f)) 
for all smooth / : U —► R, and that it is the only vector field with this 
property. (Hint: the coordinate functions x l : U —► R axe smooth.) 

b) Deduce that u does not depend on the chart used to define it. 

c) Deduce from la that if d \are the basis vector fields given 
by any chart of an at least C 2 manifold, then [di,dj] = 0 for ij = 
l,...,n. 


3. 


If ^ is a local flow for v around x and / : M —► R is a smooth function, 
then 


(»(/)) = 




4. If <f> is a local flow for u around x and v is another vector field, then 


[u y v L = lim 
L J h^o 


f {DxjjhY^v+Jg) - t> g \ 

V h ) ■ 


5. If ti, v are vector fields and / : M —► R, show by comparing effects on 
a typical g : M —► R that 

[u,fv] = u(f)v + [u, 1 >] . 

6. For any vector fields u, v, w on M , prove similarly the Jacobi identity : 

[«, [w,«>]] + [w. [w,«]] + [te, [«,»]] = 0 . 
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VIII. Connections and 
Covariant Differentiation 


“Whither the spirit was to go, they went; 
and they turned not as they went.” 

Ezekiel 1.12 


1. Curves and Tangent Vectors 

We have remarked (VII.5.02) that any vector in TM can arise as a tangent 
vector to a curve. It can moreover be defined in this way; Exercises 1- 
3 outline this construction of the tangent bundle. This way of looking at 
tangent vectors is central to the notation and thinking of this chapter, so if 
you do not do these exercises in full, at least be sure you are clear what is 
asserted in them. The tangent bundle is like compactness: not to be grokked 
in fullness from any one point of view. 


Exercises VIII. 1 

Suppose we have two curves / : [a, 6] —► M , g : [c, d] —► M in a manifold 
M modelled on an affine space X, with t € [a, b] fl [c, d], f(t) = g(t) = p, say, 
and p in the domain U of a chart <f> \U —► X on M. We define / and g to 
be tangent at t and p if and only if D t (<l> of) = D t (<f> o </) as linear maps 
TtR —► T^( p )X. 

1. a) Prove that this definition is independent of the chart used, and that 
tangency at t is an equivalence relation on the set of paths taking t 
to p. 

Thus we have a rigorous definition of tangency, intuitively amount¬ 
ing to / and g going in the same direction through p, at the same speed. 
(Notice that this is stronger than just requiring the curves as sets, in 
the elementary sense; the parametrisations are involved.) We can use 
this to define the collection of speeds-with-direction - that is tangent 
vectors - at p, as follows. 

b) Define the sum of two paths hi, h 2 in X with h\(t) = / 12 (f) = x by 
(hi + h 2 )(s) = x + d(hi(s),x) + d(h 2 (s),x). Using this definition, 
show that if /, g are tangent vectors to /', g ' respectively at t and p, 
then for any chart <j> around p 
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X 


.1 


D t (<f> o f + <j> o g) = D t (<t> o f + <j> o g‘) 

so that <j)*~(<t> o / + <j> o g) is tangent to o f + o g') at t and p. 

Use this to define and prove well-defined, an addition on the set 
T P M of tangency classes of curves at 0 and p. Define a scalar multi¬ 
plication similarly, and show that this gives a vector space canonically 
isomorphic to the tangent space T P M as defined in the last chapter, 
(cf. Exercise VII.2.5) 

c) Show that if X is R n , <j>(p) = (p 1 ,... ,p n ) and a curve c,- is given by 
c t '(s) = ^(p 1 ,... ,p* + «,. •. ,p n ), then c,- is a member of the tangency 
class corresponding to the vector (9,) p £ TM defined in VII.4.02. 

d) Define a topology and differential structure, on the set U P £m{T p M} 
of all tangency classes of curves in M tangent at 0 and any p £ M , 
such that they coincide with those of the last chapter. 

2. If t £ T p M is represented by a curve / in M, and g is a smooth 
function M —► R show that 

= = 0) 
in the notation of VII.1.03. 

Thus d( 7 (J) is “the rate at which £ is changing for an observer passing 
through p with velocity t”, and we have come very close to the earliest 
idea of directional derivative; “the ratio of the infinitesimal change 
in / to the standard infinitesimal time dt in which an infinitesimal 
displacement given by t is made” or similar formulations. 

3. Let / : M —► N be a smooth map, and define D x f on any vector 
teT x M by choosing a representative curve c in the tangency class i, 
and setting D x f(t) to be the class of / o c at f(x). 

a) Prove that D x f(t) is well defined (does not depend on the choice of 
representing curve). 
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b) Prove that D x f is linear. 

c) Prove that this definition of D x f coincides with the previous one 
(VII.2.03) via the isomorphism of lb. 

4. a) In Exercise VII.2.8b a manifold structure is defined on a set /*“(p) = 
P, say. For each x £ /*“(p), T X P corresponds exactly to the kernel in 
T X M of D x f. 

b) Give a metric vector space ( X , G) the canonical affine structure and 
the constant metric tensor field obtained from G. By Exercise VII.2.1b 
and Exercise VII.2.8c, {x€X|x-x = l}isa manifold. From a) 
above its tangent space at a point x is just the set of vectors in T m X 
that are orthogonal to d£(x). 

c) Deduce using IV.2.06 that a metric tensor field is induced on { x | 
x • x = 1}. In particular, the metric tensor field induced on 

{A£L( R 2 ;R 2 ) | det A = 1} 

by the determinant metric tensor on R 4 (cf. IV. 1.03 and Exercise 
IV.3.6) is indefinite, giving a pseudo-Riemannian manifold. What is 
its signature? 

2. Rolling Without Turning 

The differential df of a function / on an n-manifold M is a covariant vector 
field on M, as we have seen. That is, at each point we have a linear function 
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that for each tangent vector tells us how fast the value of the function will 
change initially, if we whizz off in that direction and at that speed. It is a very 
useful object: for example, if the function / is thought of as a potential, df 
is its gradient. Obviously, we would like to generalise this powerful operation 
that gets df from / to tensor fields of higher order than (§). So, what is the 
change in the value of a tensor field w at p, if we move p a bit? 

Out of this world. 

Out, that is, of our universe of discourse up to now.: that of tensors 
on M. If for instance to is a contravariant vector field, thought of by the 
embedded picture as shown in Fig. 2.1, it is clear that as we move along the 
curve c towards p the tips of vectors at successive points are moving at right 
angles to the tangent plane at p. Thus the direction and rate of change at q 
of w along c is not, itself, a tangent vector to M. Nor is it any sort of tensor 
on M. The path of the ends of the attached vectors is a curve not in M but 
in the affine space X y in which M is embedded, representing a vector. But 
the vector is not tangent to M, nor even located at q : it is at the tip of the 
vector w(q ), which is not even a point in M. So it is useless to look for it in 
some (7]f Af) p ; it is a vector, but in the wrong place. 

If however, we 

(i) replace this vector, tangent to a point in X, by the corresponding 
free vector (II.1.02) in T, which we shall call i, 

(ii) replace T q M by the subspace of free vectors d q (T q M) C T, 

(iii) Use a metric tensor on T to take the “tangential component” of t, 
that is project it orthogonally into d q (T q Af), and finally 

(iv) Move the result back to T q M by applying d qy 

we get a vector in T q M which will serve as a derivative of w at q in the 
direction represented by c. 

Another way of obtaining this derivative is to roll the affine subspace of 
X tangent to M along the curve c without turning or slipping. Technically, 
this can be taken as giving a family of affine maps { f t : E n —► X 1 1 6 [a, b] } 
(where [a, 6] is the domain of c) such that: 

(a) Each ft is a “rigid position” for the Euclidean space E n , that is it 
preserves lengths and angles (its linear part f t must have ft(v)-ft(w) = v-w). 

(b) For each x in [a,6], /*(E n ) is the affine subspace of X tangent to M 
at c(t). (This is the “rolling” condition.) 

(c) For any point x in E n , the smooth curve 

c x : [a,6] —► X :t ^ f t (x) 

traced out by x as we roll E n has no component of velocity tangent to /*(E n ); 
c*(t) • v = 0 for any v G D x f t (T x E n ). (This is the “without slipping or 
turning” condition: a sliding of the subspace, for instance, would break it.) 
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(We defer to 6.07 the proof that if we fix a rigid position / 0 tangent to 
M at c(0) there exists a unique such family, and that the derivative we get 
is independent of the choice we make of /o.) 

Now, as the subspace rolls, the vector field w on M specifies a vector in 
it tangent to M at c(tf), for each successive position f %. The result is a curve 
c in E n , with c(t) = + w(c(t))). We can differentiate this to give 

c*(<), translate the result to a vector, v say, at fj~(c(t)) f and get the same 
vector Df t (v) as a derivative of w by c*(t) as by the previous procedure. 
We leave the formalities to Exercise lb since as usual we shall do our more 
detailed work with the bundle picture. 

What we have achieved is a way of “connecting” the successive tangent 
spaces along a curve. A “direction” tangent to M is assigned to a change from 
a vector in one tangent space to one in another (hence the name “connection” 
for the formulation we shall introduce shortly). It has a slightly curious 
feature. If we connect the tangent space at p to that at q by rolling along a 
curve between them, we get a map 

T p M —>T q M : v »-► (vector v is rolled to) 

which is plainly affine, but need not be linear. Rolling the tangent line at 
(0,1) once clockwise around the unit circle in R 2 (Fig. 2.2) carries its origin 
to the old position of the point —2 tt. This comes of rolling entirely without 
slipping: in the previous description, it comes of looking only at changes 
in the tips of vectors along c, ignoring movement of the roots. Not quite 
so natural at first glance, but often more useful, is a version which in our 
first description translates the tangent vectors at the points c(t) to some 
standard point before differentiating the movement of the ends, and in our 
second keeps sliding the tangent space to keep the origin always at the point 
of contact as the space rolls, still without turning. This makes the vector 
field on R 2 sketched in Fig. 2.3b, rather than a, the constant one by the test 
of the vector tip at one point being carried to that at another: to roll the 
tangent plane to a flat plane in R 3 along a curve in it, without sliding, is to 
keep it fixed. Since this involves linear maps between the tangent spaces, 
while the first involved affine ones, we have correspondingly linear and affine 
connections. Since linear connections are much more widely useful than affine 
ones, they are often called simply connections (and in a few books miscalled 
affine connections. Be warned.) 

Notice that the affect of rolling a vector from p to q depends on the 
route. Fig. 2.4 illustrates this (for the slide-back-to-the-origin, linearised way 
of rolling) for a vector at a point A on the equator rolled to another point J5, 
a) along the equator, b) via the North Pole. 

The other crucial dependence is on the metric on X . We can get a 
different derivative by using a different metric (most dramatically, if we switch 
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from a definite to an indefinite metric. In this case we get very different 
isometries.) However it turns out to be a dependence only on the metric as 
restricted to the tangent spaces: any embedding of M in an affine space with 
a constant metric tensor that induces the same metric tensor on M gives 
the same connection. This is a remarkable fact, best proved by showing the 
derivative above to be the same as that defined in the next section, which 
only uses the metric on the tangent spaces, a proof outlined in Exercise 6.3. 
We leave it as an exercise, because it is the intrinsic, not the embedded 
description which is important for spacetime in current theories. (Science 
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Fig, 2.4 


fiction has embedded the universe in all kinds of things, including abstract 
mathematics as a concrete object [Kagan], but physics to date has kept to 
the thing in itself.) We shall just use embeddings for illustrations. 


Exercises VIII. 2 

1. This exercise is a formalisation of the above, to allow a later proof 
that it gives exactly the Levi-Civita connection on M (defined in 6.05 
below), so that we can illustrate in the embedded picture such things 
as parallel transport (defined in §4 below). If you are happy with the 
pictures, and prepared to believe that they correspond to the Levi- 
Civita connection, you can ignore it. 

Let M be embedded in an affine space X with vector space T, 
freeing maps d x and a constant metric tensor G, in such a way that 
G restricts to a metric on each tangent space (cf. VII.3.05). Assume 
the existence for any curve c : [a, 6] —► M with c(0) = p, say, of a 
family of linear maps A t : R n —* T (not affine to X, now) for each 
t G [a, 6], with R n having the standard inner product or one of the 
standard indefinite metrics, such that (cf. 6.07 below) 

(i) G(A*v, A t w) = v w for all v>w G R n , t G [a, 6]. 

(ii) d^(A*(R n )) = as sets, for all t. 

(iii) For any point x G R n , the curve c 9 : [a, 6] —► T : t »-► A*(x) has 
c* (t) • v = 0 whenever v is a vector at c 9 (t) tangent to A t (R n ). 
We think of A t as a “position” of R n in T; the change of A t with 
t copies at 0 G T the rolling around of the tangent spaces at c(t) 
as t changes. 

(This is the formalisation appropriate for the idea of rolling with slip¬ 
ping, to keep the origin as the point of tangency, but without turning.) 
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For any vector field w on M we define a curve w : [a, 6] —► R n by 
w(t) = Aj d c ( t )tn(c(*)), and set 

V lW = At (tt>* (0)) 

where t = c*(0). Then 

a) Prove that this coincides with the result of setting 

w(t) = d e(t) w(c(t)) , V t w = d~P t (w*(t)) 

where P* is orthogonal projection T —> d c ^(T^M). 

This is the linear version of the affine construction first discussed 
in §2.) 

b) Show that the rolling without slipping discussed in the text gives the 
same derivative as the one obtained by differentiating the path de¬ 
fined by the tips of the tangent vectors (ignoring the changes in their 
roots), translating what you get to the point of interest, and taking 
the component tangential to M of the result. Can you prove from this 
definition of Vttu that it is independent of the choice of path c repre¬ 
senting t? (This result follows from the intrinsic approach without a 
separate proof.) 

3. Differentiating Sections 

We turn now to considering a vector field as a section of the tangent bundle, 
without involving embeddings. This is logically tidier, since the relationship 
between a manifold and its tangent bundle is fixed, while the embedded 
picture requires a choice of embedding. In coordinates it emerges as far 
more convenient, using the charts on TM we constructed from those on M 
in VII.3.01. 

Now the vector field w is just a smooth map M —► TM, and the results 
of “changing p a bit” in various directions are summed up by its differential 
(cf. Exercise 1.3, VII.3.02). Now the differential of a map between any two 
manifolds goes from the tangent bundle of one to the tangent bundle of the 
other, so we have 

Dw : TM —► T(TM) 

The domain of this is as we want it; we are looking for a derivative corre¬ 
sponding to each vector tangent to M ; but the image is in the wrong place. 
We get vectors tangent to TM, not to M. (In classical coordinate notation - 
with most of the functions and arguments involved suppressed - it shows up 
at once that differentiating a vector field on M gives no sort of tensor field on 
M directly. But what it does give is less than clear. Historically this made it 
much harder to find a way of “correcting” the result to give something more 
manageable.) 
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Fig. 3,1 


In Fig. 3.2 we draw M and its tangent bundle slightly bent, to let us use 
the embedded picture to represent the tangent spaces at p and w(p) to M 
and TM. For a vector t tangent to M at p we have (D p w)t at w{p) tangent 
to TM\ a vector, but in the wrong place. We want, as in the previous section, 
to swap it for a vector tangent at p to M . 

In this picture, the interesting part of D p w(t) is obviously its “vertical 
component”; how w is changing at p for us as we go through p at velocity t. 
This unfortunately is less easy to discover than its “horizontal component”. 
For we have a natural projection T u ,( p )(TM) -+TM in the form of D„ {p) n. 
This just expresses the fact that we are going at velocity t: 

D v>{p) n(D p w{t)) = D p (n o w)(t) = D p (I M )(t) = I TrM (t ) = t 

by the chain rule and the definition of a vector field. What we want is a 
projection of T W ^(TM) onto the subspace of the “vertical” vectors tangent 
to TM at w(p ), that is those tangent to the fibre T P M CTM. Once we have 
taken D p w(t) to its component tangent to T P M ) we can use T P M J s nice flat 
vector space structure to look at this as a vector in T p M in an unambiguous 
way because T p M is an affine space with itself as vector space. Then we 
have - it turns out - a derivative, with nice properties. 

Now, taking the component of a vector in a subspace S is exactly apply¬ 
ing orthogonal projection onto 5 (IV.2.01), which depends on a metric and is 
different for different metrics. So we must now work with M a Riemannian or 
pseudo-Riemannian manifold with metric tensor field G. Regrettably, this 
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does not solve the problem at once, since it gives a metric tensor on each 
T P M, not on each T W (TM). G on M does in fact produce a canonical met¬ 
ric on TM (Exercise 6.6), but a direct definition of it from G would not 
be geometrically intuitive. We shall therefore concentrate on the orthogonal 
projection, which by Exercise 1 is logically equivalent to the metric. Let 
us look at the consequences of having such a projection P v at each point 
v G TM . This is the most geometric definition of a connection (Exercise 1), 
and we use it to motivate the most formally powerful. 

An orthogonal projection P in a space X gives a decomposition of a 
space into the direct sum of its image P(X) and its kernel ker P = (P(X)) 
(Exercise VII.3.1d). In this instance we shall call the image P v (T V (TM)), 
which can be identified naturally with T v (T n ( v )M ), the space of vertical 
vector V v (M) at v , and its orthogonal complement the space of horizontal 
vectors H V (M) at v. (Fig. 3.3 is drawn as it is to emphasise that the idea of 
“orthogonal” varies with the metric. “Horizontal” means, by definition, “in 
ker P„”, not “looks level in the picture”. Remember the variety of orthogonal 
projections in Fig. IV.2.2.) 
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Fig, 3,3 


For any vector field w on M and vector t E T P M , we can use these 
projections to define exactly a “directional derivative” of w by t at p, in 
T P M where we want it: 

V t to = dv,(p) (Pw(p)(&p w (t))) 

Here D p w is the derivative at p of w as a map from M to TAf, P w ( p ) is the 
projection T W (TM ) —► V W (M) we are assuming we have, and d v ( p ) is the 
freeing map taking vectors in V W {M) = T W (T P M) to vectors in T p M itself, 
using the vector/affine space structure of T P M . (V is generally pronounced 
“del”, or sometimes “nabla”, after an ancient Hebrew instrument of the same 
shape.) 

Clearly, Vtw will depend linearly on t, for d w ( p y P w ( p ) and D p w are all 
linear. How would it behave for different wl Since we have not formulated 
the conditions that the Py must satisfy as we vary v, we cannot deduce this 
behaviour from the foregoing: we are free to decide what properties would 
be nice to have 1 . 


1 Subject of course to arriving at the usual answer, which this book is supposed to com¬ 
municate. It all rather suggests a theological scheme in which you have free will in 
the matter of deciding why it is a good idea to do what you are predestined to. The 
elaboration of unusual answers is called research. 
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Fig. 3.4 


First, we obviously want linearity. If u, w are two vector fields and A a 
real number, we want 

Vt(ti + w) = Vtti + Vtw , Vt(Au>) = AVtw . 

For the whole idea of the differential calculus is to make everything linear 
whenever possible; the differential of a map is just its replacement by a linear 
approximation at each point. However, we need rather more: w being a 
vector field ( D p w , and hence Vtin, are not defined if w is just a vector at p) 
we may want to multiply it not just by a single constant everywhere, but by a 
function / on M . We cannot expect simply Vt(/u>) = /(p)Vtiu. For instance 
suppose w and / on M = R are as illustrated in Fig. 3.4, so that fw must be 
as shown in (c). Now V t (fw) is supposed to measure the rate of change in 
w at p in a way related to the usual metric on R. Therefore Vttu and hence 
f(p)Vtw should plainly be positive, as w is “increasing” to the right. But 
equally plainly Vt(/tu) should be negative. The next simplest formula is the 
analogue of that for differentiating products of functions (Exercise VII.1.5b), 
the Leibniz rule 

/(/</) = (<(/)) g(p) + /(p)(*(<7)) • 

This suggests 

V t (/w) = (i(/))w(p) + /(p)V t w . 

For we certainly want the effect of Vt to generalise the effect of t on functions 
(alias (§)-tensor fields): when we define Vt for general tensor fields we want 
it to coincide with what we already have for (o)-tensors (cf. also Exercise 2). 

Finally, we want everything to stay smooth. If in is a smooth vector 
field, and instead of differentiating just at one point, with respect to a single 
vector, we take another smooth vector field t and find V t ( p )in at each point 
p, we get a new vector field. If this is not smooth, our projections P v were 
not smoothly chosen and could not have come from a smooth Riemannian 
metric on TM. 

We can summarise these requirements as follows: 

3.01. Definition. A connection on a manifold M is a function V which 
assigns to every tangent vector t and C°° vector field w on M a vector Vtin 
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in T p M (where t is at p G M), such that 

Ci) V (,+ t )w = V 9 w + V t w for any s, t in the same tangent space, and 
vector field w. 

C ii) V t (t4 + w) = V t u + Vttu for any t £TM, ti, w vector fields on M. 
Ciii) V\tw = Wtw for any t G TM , tu a vector field. 

Civ) Vt(/tu) = (t(/))ti>(p) + /(p)Vtiu for any t G 7}>M, w a vector field 
on M , and (7°° / : M —► R. 

Cv) If i, w are C°° vector fields, so is 

V t w : p 1-4 V t (p)tn . 

It turns out that many of the important tools in differential geometry 
can be derived from a connection, to the point that we could almost forget 
about metrics. We shall not so forget, but we shall investigate the geometry 
of a “manifold with connection” as a thing in itself for a while before coming 
back to relate connections to metrics. 

3.02. Coordinates. First, let us see what a connection looks like in co¬ 
ordinates, as some proofs will be easiest that way. We use a chart <j> : 
U R n : p »-► (^(p),..., x n (p)) and the basis vector fields ft,..., d n set up 
in VII.4.02. Writing t = t*ft, w = w*dj we have 


V*(p)w = V ( .( p )a,( P )Kd;) 

= < ‘(p)Vd j ( P )(«^^) by Ci) 

= t'(p) (di(u? )dj + 1 v* (p)Va i(p) (<9j)) by C iv) 

= t i d i (w j )d j +t i vS(V di d j ) , 


suppressing reference to p. Now the first term in this sum is already “in 
coordinates”. For t l di(w^)dj means exactly 


(* 


l 


dw 1 

ft? 


+ • 




dw 1 

dx n 


l dw 2 
dx 1 


+ ••• + *" 


dw 2 
ft?’ 


f i dwn 
’ dx l 


+-h t 


n du£\ 

8x n ) ’ 


disentangling summations (cf. VII.4.02); but Vs-ft is just some vector in 
T P M determined by V, ft and ft. We shall represent this vector in compo¬ 
nents by 

Va.ft = rjjdk . 

The n 3 functions Tjj for i, j, k = 1,... ,n so defined are called the Christoff el 
symbols of the connection V with respect to this chart and V thus has the 
coordinate form 
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changing one dummy index. 

3.03. Transformation Formula. If we change to another chart <f> : U —► 
R n : p i-t- (x 1 (p), ..., i"(p)) we get a new basis d\,. .. ,d n for T p M and a new 
lot rip of Christoffel symbols. The two are related by the formula 

* Kp = + fam**))) (dk(^)) 

(Exercise 3a) or equivalently 

fipd y ( x ') = ii s h{^W) +Mv*')) • 

(Recall that dk(x y )d 1 (x l ) = S l kJ since d*(ir 7 ) and d 7 (x*) are components 
of the two change-of-basis matrices for T p M.) This constitutes the classical 
definition of a connection (“a set of numbers that transform according to *”). 
The essential equivalence of this to 3.01 follows from Exercise 3b. 

It is clear that * is not the transformation law for the components of 
any sort of tensor (the first term is just the formula for (i)-tensors, but the 
other involves a second differential). This is reasonable, as the rjj are a 
kind of correction term to bring erring derivatives back into the tensor fold. 
Roughly, “if differed only be a tensor from being a tensor it would 

be a tensor anyway”. (The more classical texts derive * from the requirement 
that the expression (ti*d f (v*)+tiVI»)0* ~ usually omitting the basis vectors 
dk ~ should transform as a vector, define “connection” from *, and proceed 
from there.) The rjj are not, then, the components of a tensor; to anticipate 
the language of 3.07, rjjd k is the vertical part of the derivative of dj in the 
direction di (the significant change in dj as we move in the redirection) as 
measured by this connection, rjj is its k- th component. 

We shall make some use of 3.02, but none of *, since the coordinate-free 
characterisation Ci), ..., C v) of connections is far more convenient; so * is 
there as part of our general programme of relating “numbers that transform 
right” to geometrically defined objects. 

Returning to the geometry of a manifold with connection: let us recover 
the decomposition V v (M) ® H V (M) via which we motivated 3.01, from a 
connection satisfying Ci), ..., C v). First we need: 

3.04. Definition. For a curve c : [a, b] —► M, a C°° vector field along c is a 
C°° function giving a vector tangent to M at c(t) for each t € [a, 6]; that is 
a map v : [a, 6] —► TM such that II o v = c. 

Important examples of this are the tangent vector field c* of c (Fig. 3.5a), 
and the restriction to c of a vector field w on M (Fig. 3.5b) which assigns 
the vector u>(c(f)) to the point t € [a, 6], precisely wo c. 
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Fig. 3.5 


3.05. Differentiating, Along Curves, Fields Along Curves. Our first 
approach, in §2, to finding a candidate for V t w involved only the values of w 
at points c(t) in M , and so extends at once to vector fields w along c as well as 
on M. We have chosen 3.01 as our formal starting point, however. We must 
therefore show that Vttn for a connection satisfying Ci), ..., C v) depends 
only on the restriction of in to a typical representative c with t = c*(t) of the 
tangency class i, and that we can extend this differentiation, of restrictions 
to c of vector fields on Af, to a differentiation of general vector fields along 
c. This necessary check of a credible fact is left as Exercise 4. 

We denote the resulting linear map, taking vector fields along c (not 
vector fields on M ) to vector fields along c, by V c * not V c * (although T c *w = 
V c * w whenever w is the restriction of w to c) to emphasise the difference in 
their domains. 

Now we are equipped to decompose T V (TM). 

3.06. Definition. A vector v 6 T^TAf), where w G T p M is vertical if 
D W II : T W (TM) — ► T P M has D w II(v) = 0. We denote the space kei(D w II) 
of such vectors by V w (M), as in our less formal discussion at the beginning 
of this section. 

If v is not vertical we can find a path c : [a, 6] —► TM to represent it and 
get a curve c = 77 o c in M with 



0 

0 s * . 

= Jn 

c[q T b] _ 9 _M 

Fig. 3.6 
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c(0) = p 

c*(0) = D„Z7(?(0)) = D w II{t) ± 0 

using §1. For each t € [a, 6], c(t) is a vector at c(f); c exactly gives a vector 
field c along c. In this notation, we make 

3.07. Definition. The vertical part of v with respect to V is v itself if 
D w II(v) = 0, otherwise it is 

<C(Vc*c(0)) € T„{T P M) C T W (TM) 

where c is as above. (That this is well defined, depending on only v and w , 
is Exercise 5a.) 

Defining the projection (cf. Exercise 5b) 

P w : T W (TM) —► V W (M) : v (vertical part of v) , 

we say v is horizontal if its vertical part P w (v) is 0, and set H W (M) = 
kerP„,. Exercises 5c, d show that these P w are exactly those that give V 
as discussed at the beginning of §3, so that the P w and V are equivalent 
structures containing the same information. 

The horizontal part of t E T W (TM) is t — P w t. 

3.08. Language. A connection defined as in 3.01 is called a Koszul connec¬ 
tion; the corresponding splitting of the T W (TM) into horizontal and vertical 
parts is an Ehresmann connection. The two equivalent conceptions illumi¬ 
nate each other and the coordinate definition as the various constructions of 
tangent spaces do. Pioneering work on the geometrical role of connections 
in spacetime was done by Hermann Weyl about sixty years ago, using the 
coordinate description. 


Exercises VIII. 3 

1. If G is a metric tensor field on Af, the metric tensor G p on T p M 
gives a corresponding constant metric tensor G on T P M. Identifying 
T V (T P M) with V v (M) = ker(D v II) in the natural way, show that for 
any idempotent operator P v on T V (TM) with image V v (M ), 

a) x y = G p ((D v i7)*, (D^)y)+G v (P v a5,P v y) defines a metric tensor 
on T V (TM), and 

b) P v is orthogonal projection onto V v (M) with respect to this metric 
tensor. 

2. Define a projection P x : T x (TqM) —> T x ((TqM) p ) where a? is a (§)- 
tensor at p, using the natural identification of T§M with MxR, such 
that for t € T P M the map Vt defined on (o)-tensor fields by 
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vt(f)=d m p J(p) D P m 
coincides with t: / d/(t), so that Civ) becomes 

Vt(fw) = (V t f)w(p) + f(p)V t w . 

3. a) Establish equation * of 3.03 by using Ci), ..., C v), VII.4.04 and the 
fact that Va,<9j = rfidt is the same vector however it is labelled, 
b) Suppose that we have a rule that, given a chart U —► R n , produces n 3 
functions : U —► R in such a way that where two chart overlap the 
results of applying the rule are related by *. Show that the formula 


= u t di(v k )dk + tiV/yd* 


defines the same vector field around any point whatever chart is used, 
and that V so defined satisfies 3.01. 


4. a) Show that if c*(t) / 0 for c : [a, b] —► Af, some t E [a, 6], any vector 
field v along c is locally a restriction of some vector field v, on a 
neighbourhood W of c(t) in Af, to c\j where J is a neighbourhood 
of t. (By Exercise VII.2.7a there is a choice of coordinates that makes 
this very easy.) 
b) Define 


= V c »v(f) = | Vc q <) * 


, v being as in a, if c*(t) ^ 0 
if c*(t) = 0 


and show that it depends only on v, not on the choice v of extension. 
(Work in coordinates to get 


Ve * t, = (^ +( ^ oc)( ^ oc) ^)^ 
and deduce the result from this.) 

c) Show that V c *(® + v') = V c -t> + T c -v', that V c .(Au) = AV c .t> for 
A € R, that 

VAfv) = f f v + fV e .v, 

for / : [a, b] —► R, and that if v is a smooth vector field along c, then so 
is V c *t>. (In particular, check at points where c*(t) becomes zero and 
the definition changes from of a local extension of v to Af” to 

simply “0”.) 

5. a) In the notation of 3.07, show that V c *c is independent of the choice 
of path c representing t. (Either use geometrical devices with paths, 
or bash it with the coordinate equation of Exercise 4b.) 
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b) Show that P w of 3.07 is linear, despite the different definitions on 
different subsets of T^TM). 

c) Show that 

V c .( t )C = d e ( t ) (P C ( < )(U( c(e«))) 

where e% E T*R is the standard unit basis vector. 

d) Deduce that for w a vector field on M , t E T p M . 

Vt«J = d UP (p)(P u , (p )(I>p«>(t))) . 

e) Using the coordinates on TM induced by a chart U —► R n on M 

(cf. Exercise VII.4.7) and the corresponding basis for T V ^(TM) (if the 
coordinates of p are (a? 1 , ..., x n ), and those of v are (a? 1 , ..., x n , v 1 ,..., 
v"), the basis is (^fr,.... -£r, ^r) find the components of 

p v(.r)- 

6. a) Show that if a linear map A : X —► Y is surjective, and X — 
(ker^4) ® B for some subspace B C X, then A\b is an isomorphism. 

b) Deduce that there is exactly one horizontal vector t at any v ET P M 
such that D v n(t) is a given vector t E T p M. 


4. Parallel Transport 

4.01. Definition. A vector field v along a curve c in a manifold M (with a 
connection V) is parallel if 

V c .v = 0 

or equivalently if v , considered as a curve in TM, has c*(t) horizontal for 
all t. 

A vector field w on M is parallel along c if w o c is parallel, and w is 
parallel if it is parallel along all curves. (Most such M have no parallel vector 
fields on them, as we shall see in Chapter X.) 

Fig. 4.1 illustrates parallel fields along curves in S 2 and R 2 , with their 
usual connections (associated with their usual metrics in a way we discuss 
in §6). In "rolling ” terms, we shall see (6.07) that a parallel vector field 
along a curve c is one for which any vector v(t) is carried to v(f / ), for any t , 
f', by rolling without turning of the tangent spaces along c from c(t ) to c(f'). 
This suggests that each vector in T c ( t )M should be part of one and only one 
parallel vector field along c. This is indeed true: 
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(o) 


Fig. 4.1 


(b) 


4.02. Theorem. If c : J —► M is a curve in a manifold with connection , 
and to E J y then (putting c(fo) = x) for each v E T X M there is exactly one 
parallel vector field w v along c with w(to) = v. Moreover the map 


7t-* 0 : T x M —► T c ( t )Af : v h-> w v (t) 


is linear and an isomorphism. 

Proof Without loss of generality (why?), suppose J = R, t = 0. 

We need a differentiable manifold structure on the set N = n*~ ( c(J )) 
for this proof. Since this is tricky if c or Dc is not injective, we replace M by 
MxR,cbyc:t»—► (c(f), t ), and the connection V by the product connection 
on M x R (Exercise 1); by Exercise 2 we get an equivalent problem. Exercise 3 
reduces the question to the solution of a differential equation, so that we can 
apply VII.6.04 to get 

For any v £T X M there is e E R and a unique parallel 
vector field w v along with tu(0) = v. 

Before replacing ]-£,£[ by J we prove linearity. By Exercise 3.4c if w is 
parallel along c|j_ f|C[ , w ' along c|]_ e # >c q with w( 0) = v, w'( 0) = v', then 

V c *(Aiu) = AV c *id = 0 , and Aw(0) = Xv , for any A E R 
W c *(w + w') = W c *w + V c *w f = 0 , (w -h ti/)(0) = v + 

where defined. Accordingly, A w and w + w* are the unique solution curves 
through At> and v + v' where defined. Hence if |f| < minfe:,^} 
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Fig. 4,2 


t*(Av) = Xw(t) = Ar*(v) 

T((« + v') = (to + «/)(<) = to(<) + w'(t) = T t (t>) + T,(t/) 
so each t% is linear. 

Hence if vi ,..., v n form a basis for T X M and Wi : £,[ —► TM are the 

parallel fields along c with tu,(0) = v t * given by *, and e = min{£i,... ,£ n }, 
we have w = a*W{ defined on ]— e, e[ with tn(0) = v for any v E T X M , where 
v = a*Vi. (Thus we have one e that works for all v E T X M: contrast the field 
w in VII.6.01, where we needed smaller and smaller e as we got further from 
the x-axis.) So if \t\ < e, then r% : T X M —► T C ^M is everywhere defined, and 
bijective by VII.6.06 (since it is just <f>t\r x M where <f> is a local flow for the 
horizontal vector field on N). Hence it is an isomorphism. 

Now if w cannot be extended to a parallel field along c with domain all 
of R, there is some E in R that we cannot reach. Let 

S = { e | 3 parallel w : ]-£,£[ —► TM , with w(0) = v, along c|]_ f|C [ } 

C [~E,E\. 

By Exercise VI.4.6b there is a real number e = sup S. It is clear that we can 
define w on ]— e, e[. But we can use * on c* :t c(t — e), c n : 1 1 -> c(t + e) to 
get local flows that extend w forwards past e, backwards past —e. (The r% 
are isomorphisms, so that some u E T C ^M , for instance, is mapped back to 
w of some s in both ]—e, e[ and the interval ]e — e, e +e[ that we have around 


7*lO ix*. 7^ai4e##ia£liia 




4. Parallel Transport 


225 


e by *). So e cannot be sup S after all, so S has no supremum and w can be 
defined for all R. □ 


4.03. Definition. The map r% : T X M —► T C ^M introduced in 4.02 is called 
parallel transport along c from x = c( 0 ) to c(f), with respect to the connection 
on M. We shall relate this to the affect of “rolling without turning” along c 
in 6.07. 

Notice again that in general which vector in T y M is parallel to a given 
vector in T X M depends on our choice of curve from x to y. On the sphere, for 
instance, any two unit vectors, anywhere, are “parallel” by transport along a 
suitable chosen curve (Fig. 2.4). The study of which vectors can be parallel in 
a general manifold with connection is the theory of holonomy groups, outside 
our present scope; see [Kobayashi and Nomizu]. 

Evidently, a vector field v on M is parallel if and only if Tt(v p ) = v q 
along all paths from p to q , for all p, q € M. We shall examine parallel vector 
fields in more detail in Chap. X. 

Intuitively, the effect of rolling a tangent space along a curve from p 
to q should be independent of whether we go by a slow roll or a fast: the 
parametrisation should be irrelevant. (For instance, if we stop for a while to 
admire the view we shall not alter the final result.) We prove this now for 
our more precise and intrinsic formulation of parallel transport: 

4.04. Lemma. If f = coh is a smooth reparametrisation ofc, and h*~(a) = 
a, h*~(b) = 7 , parallel transport along f from f(a) to /( 7 ) is the same as 
parallel transport along c from c(a) to c( 6 ). 

Proof Let v be any parallel vector field along c. Then v* = v o h is a vector 
field along /, and if p = /(f), s = h(t) we have 


V c *t/(f) = tJ v /( t )(P v /( t )(D t v'(e*))) by Exercise 3.5c 

= d*(,) (P„(,)(D ,v o D t h(e t ))) by the chain rule 

= (^“(*)) d v(>){ P v(»){D,v{e,))) by linearity, 


since Dth is just scalar multiplication by ^(<), relative to the bases {e<} 
and {e,} 


w 


)v c .» 


(•) 


= 0 


since v is parallel. 


Thus v f is also parallel. The result follows by uniqueness. □ 

We first used the r% (in their avatar of “rolling without turning”) in 
§2 to produce a V; we have now obtained them as “solutions” of a given 
connection V. To show that they constitute yet another equivalent form of 
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the connection, we obtain from them the projections P v and, as a corollary, 
the given V:- 

4.05. Theorem, let v G T P M, and c : J —► TM represent to G T V (TM). 
Then if we define 

Q v (w) = c*(0) G T V {T P M) , 

where c(t) = rf~ (c(t)) along II o c, Q v coincides with the projection P v : 
T V (TM) T V (T P M) of 3.07. 

Proof. 

A) If to G T v (T p M) already, c must be tangent at 0 and t> to c, by the 
smoothness of parallel transport, so that 


Q v ( to) = c*( 0 ) = c*( 0 ) = to . 


B) If Q«(to) = 0, c must be tangent to the horizontal curve through v along 
II o c, that is, to has vertical part P v ( to) = 0 , and conversely. 

C) Restrict attention to the subspace 

S = (D*)~{ At | A G R } C T v (TM) 

for some non-zero teT p M represented by injective f :J -+M. Define 
a chart on N = II*~{f(J )) with image in the affine space T P M x R by 


<l>(n) = (r^ n) (n),f(»)) 
where n £ N, t(n) = /*"~77(n). 

Then if P : T P M x R —► T P M : ( x>t ) »-► x and p' = po</> : N —► T p M t 
Q v \s is exactly D v P f by 4.04. For, any w £ S can be represented by 
a curve in TM which is a vector field along some parametrisation of / 
(unless DII(w) = 0, in which case we are covered by A). 

So Q v |5 is linear, being a derivative, and by A and B is idempotent 
with the same image and kernel as P v \s- Hence by simple linear algebra 
Qv|s = -fv|5* 

D) Since any w £ T V (TM) is in some such 5, it follows that Q v = P v . □ 

4.06. Corollary. If w is a vector field on M, and f : J —► M represents 
t £ T p M } 

■ 

Proof If c = w o /, in the notation of 4.05 we have 


lim 

h 


i = i im 

0 V h J h-0 


c(b) - c( 0 ) 


= dt,(c*(0)) G T P M putting w p = v 


Tfeat/temattcn 
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= d v (P v (c'( 0))) 
= V/*™(0) 

= V t tn . 
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by 4.05 


Thus the serve to “connect” vectors in nearby tangent spaces so as to 
let us differentiate vector fields in the ordinary way, because they give us V t tD 
as the ordinary tangent vector (freed) to the curve h — w p ). So 

the equivalence between the r* and V may be thought of as one being an 
“integrated” version of the other, the other a “differential”, local, tangent- 
vectorial version of the one. Hence one old name for the /$*• of “infinitesimal 
connection”. In exactly the same way a vector field’s property of being the 
differential of the transformations </> t obtained by integrating it (VII.§6) ex¬ 
plains the old term “infinitesimal transformation” for a vector field. Similarly, 
“infinitesimal displacement” for a tangent vector at a point. 

4.07. Corollary. 


Exercises VIII.4 

1. a) Suppose that manifolds Af, N have connections V M , V N respec¬ 

tively. Using the decomposition in Exercise VII.3.2c of any vector 
v in T(p )? )Af x N into v M + v N , where v M 6 T P M, v N 6 T t N, show 

that v,v=v;t» u +v!i,v» 

defines a connection on M x JV, the product connection of 

and V N . 

b) Show that a vector field w along a curve c : J —+ M x M is parallel 
with respect to V Mxivr if and only if both w M and w N are parallel 
with respect to V M , V N along c M , c N respectively, where c(t) = 

(c M (t),c*(0) eMxN. 

c) Deduce that a vector field w along a curve c in M is parallel if and 
only if tb, defined by 

w(t) = (w(t) + k(t)) 6 T c ( t )M ® T ff (t)R 2 T( c (*) jff ( t ))M x R , 

is parallel along c:t (c(t), c(t)) G M x R, where k is a given vector 
field along a curve c in R, parallel with respect to the connection used 
on R. 

2. a) Show that if R has the connection given by 

Vtte = dr( w *(°)) 


Pir>z.c. Tfea^/tema^rcn 
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where ti s a curve representing t, x = t( 0 ), and u>(s) = d t ( s )(w(t(s))), 
then a vector field along a curve c in R is parallel if and only if it 
is the restriction to c of a constant vector field on R (in the sense of 
VII.3.05). 

b) Deduce that if c : R —► R is the identity, then c* is a parallel vector 
field along c with respect to this connection. 

3. a) If c : J —> M is a curve in M, define c : J -^MxR:<h (c(7),7) 

and define a manifold structure on the set N = 77*" (c( J)) of tangent 
vectors to M x R at points c(7), t € J. 

b) Show that if v E T(T(M x R)) has DII(v) = c*(7) for some t , then v 
can be represented by a smooth curve in N and hence can be consid¬ 
ered as a tangent vector to N. 

c) Deduce that the map taking v E T^(M x R) to t v = c*(t) E 
T v (T^( t )M x R), in the notation of Exercise 3.6b, defines a smooth 
vector field t on N. 

This is called the horizontal vector field on AT; any solution curve 
of t, considered as a curve in T(Af x R) is a horizontal curve of the 
connection. (Evidently a curve c in T(Af x R) is horizontal if and only 
if, considered as a vector field along 77 o c, it is parallel.) 

d) Show that any solution curve w : ]—£,£[ —► N with 77(u;(0)) = c(0) 
has n(w(t)) = c(t) for all t E ]-e,e[, so that w may be thought of as 
a vector field w along c|]_ e|f [. 

e) Deduce via lc and 2b that w M is parallel along c|]_ e>e [. 

4. If M = { (r, 0) | r = 1}, the circle in polar coordinates, with respect 
to any chart U : M —► R : (r, 0) h-> 0 let V on M be given by = 1. 
Drawing TM as a cylinder, sketch the horizontal curves in TM and 
show that although by 4.02 there is a parallel vector field through any 
vector along any curve, there is no non-zero parallel vector field on M 
with respect to V. 


5. Torsion and Symmetry 

If, as in 3.01.C v), we have two vector fields t and w we can use a connection 
to differentiate either with respect to the other, getting either Vttu or V^t. 
By 4.06 V t ( p )in is “how w varies as we flow along t through p” and vice versa 
for V t4 ,( p )t. It would be nice if the results were necessarily the same, but in 
general things cannot be quite so neat. In Fig. 5.1 u is “constant” along 
the solution curves of v (parallel along them with the usual connection), so 
V v ti = 0 everywhere, while clearly V u v / 0. So before deciding whether 
V fails to be symmetrical in its effects on u and v, we must correct for any 
lack of symmetry between flowing along u and along v, themselves. As this 
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Fig. 5*1 


should suggest, the appropriate fudge factor is the Lie bracket discussed in 
VII.§7: 

5.01. Definition. The torsion of a connection V on M is the map 

T : T l M x T X M -+ T X M : (u,v) i-> V u t> - - [u,v] 

(Recall - VII.3.04 - that T l M is the space of contravariant vector fields 
on M.) 

If T is identically zero, V is symmetric. 

5.02. Lemma. V is symmetric if and only if whenever u, v commute, 

= V v u . 

Proof. 

i) If V is symmetric, then for any tx, v with [u, v] = 0 


0 = T(ti, v) — V u v - V v ti - [w,v] = - V v ti . 


So 


V u v = . 


ii) If V u v = V„w for all commuting u, v, then for non-commuting ti, v we 
work locally. In a chart we have 

u = it* di , v = v 3 dj 
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(VII.4.02) and the di all commute (Exercise VII.7.2c). Hence in the 
domain of the chart the vector field T(u,v) has 

T(u,w) = T 7 u >a,(v j dj) - V v i Si (u‘d<) - [u’d,-, v'fy] 

= u l (div j )dj + uV(Va,dj) - vj{dj\i')di 
- xPu\V di di) - (xi'iWWj - xSidjU^di) 
by 3.02, and Exercise VII.7.2a with a switch of dummy indices 
* =tiV(V di 9 i -Va i ft) 

= 0 by hypothesis, since 5,-, dj commute. 

□ 

5.03. Corollary. If around every x 6 M there is some chart giving Christof 
fel symbols for V such that rjj = V is symmetric . Conversely, if V is 
symmetric all charts give Christoff el symbols with rjj = r^. 

Proof The above proof showed that symmetry in the domain of a chart is 
equivalent to 

Va,^ = V dj di Vi,j, 

and to 

rfa = rfa vi,j 

by 3.02, and therefore to 

ri = rf t 

since the djb are a basis. □ 

Fairly plainly, (V,*v)p, (Vt,u) p and [ti, v] p can be changed by substitut¬ 
ing new fields v f and u f with v p and v p and u p = u p but passing differently 

through these values (because we can always make [u\ t? ; ] p = 0 for any given 

tip, v p ). In contrast, for their combination into T we have 

5.04. Lemma. The vector (T(ti,v)) p E T P M depends only on u p and v p , 
and depends on them bilinearly, given V. 

Proof Putting the rjj into * of 5.02, in a chart 

= uV(/* - ri)d k , 

so at a point, (T(tt,t>)) p = u*(p)^'(p)(^(p) - r^(p))d k (p). 

The result follows. □ 

5.05. The Torsion Tensor. T thus specifies, and is specified by, a bilinear 
map 
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T p M X T p M -+ T p M 

for each p, taking ( u p ,v p ) to (V U ( P )V — V v ( p )ti — [u,v] p ) where u, v are 
arbitrary vector fields with ti(p) = ti p , t;(p) = v p . This corresponds to 
specifying an element T p of 

t;m®t;m®t p m 

for each p, by a proof like that of V.1.08; in coordinates, 
T^iri-r^dx^dx* ®d k . 

From this we can recover the map T : T X M x T l M —*■ T l M as a double 
contraction. 

[(/* - r/ t ) d** ® dx j ® 5*] ® u“9 a ® t;% 

~ (/;* - r ] k i )dx i (u a d a )dx j (v f) di 3 )d k . 

That is, 

r ® (« ® v) u'V (/$ - rfrdk, 

locally. 

T, then is essentially a (^-tensor. We defer the discussion of its geo¬ 
metric nature to a future volume where space will permit consideration of 
significant examples with T ^ 0. For instance, T can describe a “crystal 
dislocation density” in a continuum model for matter. For the present we 
are concerned only with the geometric implications of its vanishing. 

Given a connection V for which T = 0 , we have 

[u,v] p = V Up v-V Vp u 

which is sometimes used to define the Lie bracket. However this obscures 
the latter’s independence of any connection or metric tensor, and conceals 
its closer relation to flowing along solution curves than rolling along them 
with the r% (compare Exercise VII.7.4 with 4.06). Notice in particular that 
for a flow D x <j>t is always an isomorphism but not usually an isometry: for 
instance, on R 2 , if v(x, y) = xe\ +ye 2 , what is .D(o,o)^i f° r the corresponding 
flow? (Exercise 1) 


Exercises VIII. 5 

1. a) Show that if v(x, y) = xe\ +pe 2 , the corresponding flow has <j> t (x , y) = 
(e‘x,e‘y). 

b) Deduce that JD(o,o)^i •' ?(o,o)R 2 —► T(o,o)R 2 is el. 
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2. a) Prove (similarly to Lemma 5.04) that if V, V are connections on Af, 
their difference 

S(ti, v) = V u v — V„v 
is essentially a (^-tensor field. 

b) Show that for any connection V and any (^-tensor field 5 on Af, the 
formula 

V u v = V u v + S(u,v) 

defines another connection V (that is, V satisfies Definition 3.01). 

6. Metric Tensors and Connections 

6.01. Definition. A connection V on a manifold Af with a metric tensor G 
is compatible with G if all the parallel transports t% it defines are isometries: 
we require 

G q (T t U pi T t V p ) = G p (u p ,v p ) 

for all p, q E Af, u pi v p £T p M> along all curves. 

That is, parallel transport must preserve lengths and angles: an obvious 
condition if it is to correspond to the rollings around of §2. A slightly less 
obvious one relates to the torsion tensor of V. 

On R 2 , for instance, we can define a connection in the usual coordinates 
with rl 2 = 1 and the other rjj = 0. The corresponding parallel trans¬ 
port, along any curve from ( x t y ) to (x',y f ) amounts to “rotation through 
sin*”(x — x')” relative to the usual idea of parallelism (Exercise 1). Since 
rotations are isometries, this connection is thus compatible with the metric, 
but clearly it is not the one we usually want. 

In general, the torsion tensor relates to this twisting of parallel transport 
away from the kind we want, which corresponds to rolling without turning. 
Hence we shall generally require connections to have torsion zero: non-zero 
torsion represents “extra structure” in a sense we outline in 6.09. 

These two conditions, compatibility with G and symmetry, suffice to 
determine the connection appropriate to Af with G. In this volume our only 
further straying from zero torsion is in Exercise IX.1.2, whose only purpose 
is to highlight the odd geometry of the above example. 

We confine ourselves henceforward to a specific M and G, and abbreviate 
G(u t v) to u • v. First, let us reduce 6.01 to a local condition: 

6.02. Lemma. V is compatible with G if and only if for any vector fields 
u, v along any curve c: J —► M and t E M, 

* = (V c *u(<)) • v(f) + li(f) • V c * v(t) 

along c, where V c * is obtained from V as in 3 . 05 . 
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Proof. First suppose * holds everywhere. Then for any parallel vector field 
w along c we have 

- 7 -(w • w) = V c *ti> • w + w • V c *tu = 0*ti> + tt>0 = 0. 
as 

Thus ii> • 10 is constant along c, so that if c(0) = p 


** T t W p • T t W p = 10 c ( t ) • 10 c ( t ) = Wp -Wp 

along c for any w p £T P M. Consequently 

4 ( T *( tt p + w p) • T »(«p + v p) - T t( u p - V P ) • T t(«p - Wp)) 

= i ((«P + Wp) • («p + Wp) - (tip - Wp) • (tip - Wp)) 

applying ** with u p + v p and u p — v p for w p . Expanding and cancelling, 


T t U p • T t V p = U p • 17 p 


Conversely, if V is compatible with G, choose an orthonormal basis 
(IV.3.05) 61 ,..., 6 n for 7^M. Then the parallel fields /3,(t) = r t ( 6 t ) along 
any c : J —» M with c(0) = p give an orthonormal basis for Vt 6 «/, 

since r t is an isometry. Hence any ti, v along c can be written asu = u*A, 
v = v*/3j, and 

^(«-w) = ^((u , '/3,)(^)) 

= -j-(u 1 t ; 1 +-h u n t; n ) 

ds 

,du l 1 du n . , x dv l n dv n . 

= ( ir v )+( “ 


Now 


Thus 


ds 


ds 


ds 




d/ 


V c *(/A) = - 7 -A + /V C *A , by Exercise VIII.3.4b for any /, A; 
ds 


=!* 


since A is parallel by construction. 


-^(u • w) = (V c .«'y3,) • V + u • (V c . t//3.) 
= V c .u • v + u • V c .v , 


as required. 
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6.03. Corollary. V is compatible with G if and only if 

w(u • v) = (Vu,u) • v + u • (V w v) 
for all w £ T p M, u, v £ T l M. 

Proof Apply 6.02 to a curve representing w. □ 

6.04. Theorem. There exists exactly one symmetric connection V compat¬ 
ible with any given metric tensor field G. 

Proof Suppose V exists: we must express it in terms of G. 

Rather than look for V u v directly, given u and v , the trick is to find first 
(cf. VII.4.05), that is, the covariant vector field w (V u v) • w . 
By 6.03 we have 


* (Vt»v) • w = u(y • w) — v • (V u w) 

and by 5.01 and V’s supposed symmetry, 

** V u w = Vt*ti + [ti, tn] . 

Hence 


= u(v-w) — + [ti, w]) 

= u(v-w) — (V w u)-v — v-[u,w] (G symmetric) 

= u(v-w) — (in(u-t?) — u*(V 10 v)) — v-[u, w] by * 

= li(u-tn) - w(u-v) + ift-(V v i0 + [in, v]) - v\u , w\ by ** 

= u(v-w) — w(u v) + ( V v w)-u + u-[w, v] — V-[li, w] 

= u(v-w) — + (y(w-u) — tt?(V v ti)) + u[w , v] — v-[ti, tn] by * 

= u(v-w) — w(u-v) + v(w-u) — + [u, ti]) + v] — v-[u> w] 

by ** 

= n(v-tn) — w(u-v) + v(w-u) — (V u v)-w — tn*[v,ti] + u-[w,v] — v-[u,w] 
2(V u v)-w 

= u(v*w) — w(u-v) + v(w-u) — w\v f u] + v] — v-[ujw] . *** 


Thus if V exists it satisfies ***, which fixes G^V^v), and therefore V u v 
uniquely. It remains to prove existence. 

The value at p £ M of the expression on the right of *** — call it 
z(ti,v,ti>) for short — depends for fixed vector fields u and v only, and 
linearly, on the value w p of w at p (Exercise 2a-c). Hence since all vectors 
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in T p M can occur as w py we have a well-defined linear functional 
(y («, t>)) : TpM — ► R : w p x(u, v,w) 
where w is any vector field with w(p) = w p . If we now define 

V u v = Gi(Y(u,v )) 

we have a contravariant vector field on M which satisfies ***. A precisely 
similar proof to Exercise 2 shows that z(u,v,w), and hence V u v, depends 
only, and linearly, on u p if we fix vector fields v and w y so we have a well 
defined V Up v with Ci) and Cii) of 3.01 satisfied. Cii) follows from the 
immediate fact that 

z(u y V + t/, w) = z(u y V, w) + z(ti, i/, w) , 
and Civ) by expanding z(ti,/v,tn) to get (Exercise 2d) 

z(u y /v, w) = fz(u , v, w) + tt(/)(t? • tn) 

so that 


v„,(/t>) • Wp = (/(p)V 11 ,T>) • t»p + («p(/)t>p) • w p Vp 

and hence 

= /(p)V„ ? t> + Up(/)tv 

as required. 

C v) follows from the fact that C?f and the operations by which z(u f v, w) 
is defined all give C°° results from C°° data. 

Hence V is indeed a connection. Similarly to the proof of C iv), compat¬ 
ibility with G follows directly from 

z(ti, v, w ) + z(u, in, v) = u(v • w) 

and 6.03, and symmetry from 

*(ti, v,w) — z(v , u, w) = [u, v] • in . 

These equations result from simply writing out their left-hand sides in full 
and collecting terms, using the symmetry of G and the skew-symmetry of 
[,]• □ 

6.05. Definition. The unique symmetric connection compatible with G is 
called the Levi-Civita connection for G. From now on, V (on a manifold 
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for which we have a metric tensor field) will always refer to this connection 
unless we explicitly state otherwise, (cf Exercise 2.1 above.) 

6.06. Components. For the d \ } ..., d n given by a chart, the Lie brackets in 
*** of the proof of 6.04 vanish, so 


(V 9l dj) • d k = Lfaidj ■ d k ) - d k (di ■ dj) + dj(d k • a,)) 

= %(b(9jk) - d k ( gij ) + djigki)) 

by the definition (IV.3.01) of the components of G. The function (V^dj)-#*, 
giving (for G Riemannian) the component, by orthogonal projection at each 
point, of V didj in the dk direction, is called for historical reasons a Christoff el 
symbol of the first kind and denoted by r,j*. The full name of the rj 1 - we 
have already met is “Christoffel symbols of the second kind.” We have 


\ (digji - digij + djgu) = (r™3 m ) ■ di 
= r% 9ml 

SO 

\9 kl {digji - digij + djgu) = r^g m \g kl 

r <m ck 

ij °m 

= J* 

by the usual formulae. So we can apply the usual formula for raising indices, 
setting 

r}j = 9 kl rni , 

although this is not simply an application of Gj since the r^i do not consti¬ 
tute a tensor. 

An occasionally useful fact is that 


I\jk + rjki = 2 (di9jk - dkgij + djgki) + \{dj9ki — digjk + dkgij) 

= d k gij for all i, j, k. 

6.07. Rolling. Neither Theorem 6.04 nor the formulae above are outstand¬ 
ingly geometrical, so it is worth rigorously tying the Levi-Civita connection 
to the “rolling” approach we first discussed. 

If M is embedded in an affine space X with vector space T and a constant 
metric inducing the metric tensor field G on M , and c : J —► M has c(0) = p, 
let A be any isometry from R n (with a metric of the appropriate signature) 
to T P M. Then if we define 

At = d c (t) o r t o A : R n —+ X , 

where r* is parallel transport along c from T P M to T C ^M C T C ^X with 


Oix*. 7^a£/Le#fui£Zciz 



6. Metric Tensors and Connections 


237 



x ' w\\n 


Fig. 6.1 


respect to the Levi-Civita connection for G, we have the unique family A% 
satisfying (i), (ii), (iii) of Exercise 2.1. For condition (i) follows from the 
requirement that V be compatible with G, and (ii) holds by construction. It 
is clear that the curve c m of Exercise 2.1 is exactly the curve t »-► d c ^x(t), 
where x is the parallel vector field along c through A(x). It remains only to 
show that (iii) follows from the facts that x is parallel, for all x 6 R n , and 
that V is symmetric: this we leave as an exercise (Exercise 3) in components. 
Notice that this proves the existence of the A %, and that uniqueness follows 
similarly, since essentially what is involved is the equivalence of the system 
of differential equations defining parallelism to condition (iii), for V symmet¬ 
ric. Hence the differentiation of §2 is indeed that given by the Levi-Civita 
connection. 

Thus we have rigorously established the “rolling” picture, which is useful 
for testing our intuition about the Levi-Civita connection: Fig. 4.1 for in¬ 
stance had the connections for the normal Riemannian metrics on the plane 
and sphere in mind. 

On a point of language: notice that with the normal connection on 
the plane the field shown in Fig. 6.1 is not parallel in the sense of 4.01, 
since parallel transport preserves lengths as well as angles, though in the 
elementary sense of “having the same direction” the vectors are parallel. The 
alternatives are to redefine “parallel”, to use some other word line “constant” 
which would involve worse confusion, or invent something like “translationally 
congruent”. The universal choice is the first. 

Rolling vectors around on a manifold with an indefinite metric takes a 
little more imagination: notice in particular that since it preserves the lengths 
of vectors it takes timelike/null/spacelike vectors to others of the same kind. 
There can be nothing like the way any direction on the sphere (with the usual 
connection) can be rolled/parallel transported to any other. 

6.08. Signs. In terms of rolling, it is clear that replacing G by —G should 
not change the resulting parallel transport (nor, hence, the connection) since 
the same maps are isometries. In components it can be seen that replacing 
9ij by -gij leaves = \g il {digji - digij + djgu), (by 6.06), unchanged. 

Thus neither the sign of V nor that of the Riemann tensor we define 

from it in Chapter X is altered if we change from a (H-) metric tensor 

on spacetime to the equally popular (— + ++). 
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6.09. Naturality. We have seen the way that a finite-dimensional vector 
space V is isomorphic to its dual V*, but not “naturally” so. To choose 
a particular isomorphism A : V —► V* is equivalently to choosing a non- 
degenerate bilinear form B such that B(u,v) = Au(v) on V; for example, a 
metric tensor. Such an extra structure contains all the geometric possibilities 
of Chap. IV, and more besides. Similarly, a metric tensor field is considerable 
extra structure for a manifold. Given such a field, how much extra does a 
connection represent? 

We have seen that a metric G determines a particular V via the condi¬ 
tions of compatibility and symmetry. It is a recent result in [Stredder] that 
G determines this V via merely the condition that V should not represent a 
further choice of structure, in a sense roughly as follows. 

If N f is an open subset of a manifold V, it is also a manifold in an obvious 
way. It is clear how to “restrict” a connection V# or metric tensor G# on N 
to Vat|ap or Gjv|jv' on N '. Suppose we have some rule that assigns to every 
manifold-with-metric-tensor (Af, Gm) a connection V° M on Af. Suppose 
also that whenever N 1 is an open subset of (iV, G^), the connection 
we get by applying the rule to (iV, GjvIjvO coincides with the connection 
V Gn \*' we get by applying the rule to (JV, Gjv) and restricting the result to 
N '. Then the rule can only be “Choose the Levi-Civita connection”! Both 
compatibility with the metric and symmetry turn out to be consequences of 
“naturality with respect to restrictions”. 

So given (Af, G) we get the Levi-Civita connection V free in the same 
package: any other connection V we pay for with a special choice, represent¬ 
ing extra structure. (In physics this would generally mean another force or 
form of matter added to the theory.) By Exercise 5.2 we see that the special 
choice involved is exactly that of the (^-tensor field describing the difference 
between V and V. 


Exercises VIII.6 

1. Prove that parallel transport on R 2 by the connection given in the 
usual coordinates by r± 2 = 1, the other rjj = 0, is as described in the 
discussion after 6.01. (either translate the description into a formula 
for a general horizontal curve and prove that it solves the differential 
equation, or apply Theorem 4.06). 

2. In the proof of 6.04: 

a) Show that z(u,v,w + in') = z(u,v, w) + z(ii,v,in / ). 

b) Show that ti(v • fw) = u(f)(v • w) + f(u(v • w)). Use this and 
Exercise VII.7.5 to show that z(u,v, fw) = /(z(ti, v, w)). 

c) Show that for any linear F : T X M —► TqM with F(fw) = f(F(w)) 
Vu;, /, (F(w)) p depends only on w p . (Show that the question is local 
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by taking / zero outside the domain of a chart p. In this chart write 
w = w'di and consider w and w ' with w f (p) = w(p).) Deduce that 
Y(ti, v) is a covariant vector field. 

(Note the similarity between this proof and the proof (5.04) that the 
torsion of a connection is a tensor field.) 
d) Prove that z(u, /v, w) = /z(u, v } w) + 2 u(f)(v • w). 

3. a) In the situation of 6.07, take coordinates (x 1 ,...,^) on X corre¬ 

sponding to some choice of an origin for X and orthonormal basis 
for T, take a chart giving coordinates (fl 1 ,...,?*) for point q in a 
neighbourhood of p £ M, and define x*(q) = i-th coordinate of q as a 
point in X, i = 1,..., N. Write out everything in these coordinates, 
and prove condition (iii). 

b) Deduce via Theorem 4.04 that the definition of V in Exercise 2.1 
gives exactly the Levi-Civita connection for the metric induced by the 
embedding. 

4. The Levi-Civita connection for any constant metric tensor on an affine 
space has parallel transport 

r = d^d p : T P X T q X 

independently of the curve. 

Using the charts on the sphere S 2 set up in Exercise VII.2.1, find the 
components gy of the metric induced by the standard embedding in 
R 3 with the standard metric. (These are not just since the di’s 
produced are not orthonormal except at special points.) 

Find the F •* for the corresponding Levi-Civita connection. 

Show that around any p£ S 2 there is a chart making the rjj all zero 
at p (though not around p). 

Repeat for the general sphere S n C R n+1 . 

Use Exercise 3.1, Exercise 3.4v and 6.06 to give in components a metric 
tensor on TM such that the resulting orthogonal projections P v onto 
vertical subspaces give the Levi-Civita connection by 

= ^(P^iDpviup))) . 

7. Show that the connection in Exercise 4.4 is incompatible with any 
metric. 

8. IF M, N and M x N have metrics G M , G N , and G MxN (Exer¬ 
cise VII.3.2v), show that the Levi-Civita connection on M x N is the 
product (Exercise 4.1) of those on M, X, and that a vector field along 
c:Ih (c^t),^^)) £ M x N is parallel if and only if its Af, N 
components are parallel along c M , c N in Af, N respectively. 


5. a) 


b) 

c ) 


d) 


6 . 
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7. Covariant Differentiation of Tensors 

In section 2 to 6 of this chapter we have established how to differentiate (J)- 
tensor fields when we have a metric. We already knew how to differentiate 
(o)-tensor fields: what about other types? Fortunately, we do not have to 
invent new machinery for each - what we already have will rapidly set up 
differentiation for them all. 

7.01. Transporting Tensors. If r : T P M —► T C ^M is parallel transport 
of vectors along c : J —► M from p to c(t) = p, define parallel transport of 
(*)-tensors along c from p to q by 

(t*)* : T p M ® • • • <S> T p M <g> T;M ® • • • <g> T;M 

-> T q M ® • • • <g> T q M 0 T* ®---® T*M 

Vl ® • • • ® Vk ® fi ® • • • ® fk 

T t v 1 ® • • • ® r t v k ® ® • • • ® {rfy~ fh 

on simple vectors. (Recall that (t*)""/ just means the functional whose value 
on a vector v at q is obtained by transporting v back to p and evaluating 
f on the result (cf. III.1.03): we have r t */ = /or*, f = / o r/“). 
Evidently this reduces back to r on (J)-tensors, and for (o)-tensors it is just 
the identity R —* R, the usual tensor product of zero copies of a map. 

7.02. Definition. If v is a (*)-tensor field on Af, u E T p Af, and c : J —* M 
any representative of ti, we define the covariant directional derivative of v 
with respect to u as 

That this is independent of the choice of representing curve c follows 
from: 

7.03. Theorem. The derivative defined in 7.02 has the following properties. 

A) IfferfM, v„/= «(/). 

B) Ifv£ TqM, Vt»u is the vector given by the connection we used to 
define r. 

C) V„(t> + to) = V u v + V u to, for v,w £ T^M. 

D) V«(ao) = aV u v, for v £ T k M, a £ R. 

E) V„(» ® to) = (V„o) <g> to + v <g> (V„to) £ Tf+^M, for v £ T^M, 
to € TlM. 

F) IfC : T£M — T*Z\M is a contraction map (VII.3.04), V u (Cov) = 
C(V„t>) for v £ T h k M. 


V u v = lim 

t-+o 


^ ( T *)r»c 
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G) For u, v f £ T P M, a £ R, 

V u+tl /v = V u v + , V au v = aV u v . 

Proof. 

A) is essentially just (one) definition of u(f) (Exercise 1.2). 

B) is Theorem 4.06. 

C) and D) follow from the linearity of r t , which implies that of (r£) t . 

E) is straightforward and left to the reader (Exercise 1), with a full 
outline. 

Notice that if either v or w is just a function, v®w reduces to a pointwise 
scalar multiple. Thus E is a generalised Leibniz rule, reducing to the usual 
one when both v and w are functions (Exercise VII.4.4) and to 3.01 Civ) 
when v is a function and w a contravariant vector field. 

By E it clearly suffices to prove F for (J)-tensor fields, since we can 
arrange any contraction as 

t£m a t\m ® t*z\m —» t$m ® t*z{m a t*z[m . 

Col 

This simplifies notation in the proof 

c (( T i)t(v P ® fp )) = C(T t v p <g> ( f p o T,* - )) , by 7.01, 

= fp(rr(rtv P )) 

= /p(»p) 

= C(v p ® fp) 

= (rS)t(C(vp ® f p )) , since (r^) t = Jr, 

that parallel transport commutes with contraction. 

F) follows immediately, since C is continuous and so commutes with 
limits. 

G) is left as an easy exercise (lb-d). □ 

7.04. Corollary. V u v as defined above is independent of the choice of rep¬ 
resenting curve c. 

Proof By 7.03 E V u is determined by its values on vector fields. 

By Exercise lb V u is determined on covariant vector fields by its values 
on contravariant ones. 

By 7.03 B V u coincides on T l M with the original connection, whose 
values do not depend on c. □ 

7.05. Definition. In VII. 1.02 we met a directional derivative with respect 
to u as the image of u under the derivative of a map. Similarly we define 
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the covariani derivative at p of an (£)-tensor field v to be the map 
V p v : T P M —» (T£M) p : u V u v . 

Linear by 7.03G, this corresponds canonically by V.1.08 to a vector in the 
space 

(T P M)* 0 ( T P M 0 • • • 0 T P M ® T*M 0 • • • 0 TIM) . 

' -y* ". ' '-V-' 

h times k times 

Switching the dual space on the left round to the right (Exercise V.8b) for 
convenience, we get an element of T^ +1 Af, which we shall also denote by 
V p v. (Applying this V p v G T* +1 M, as our initial linear map, to it G T P M 
just means taking the tensor product V p v G T* +1 Af, as our linear map, to 
it G T p M just means taking the tensor product (V p v) ® u and contracting 
over the last two places. In coordinates the isomorphism L(T P M ; (T* M) p ) 9* 
(Tjf +1 M) P vanishes into invisibility, since the same “sets of numbers” serve 
as components on both sides.) 

Evidently V p v depends smoothly on p by 3.01 Cv) and Theorem 7.03. 
So we have a new smooth tensor field Vv on M, the covariant differential 
of v, of the same contravariant order as v and covariant order one higher. 
(Hence, it is sometimes asserted, we use the term “covariant”. But at the 
time it was christened, “covariant” was also used to mean “independent of 
the choice of coordinates”. It seem more plausible that the name is just due 
to this property, which took a lot of work to reach when working entirely in 
components (Exercise 2b). Over to the historians.) 

7.06. Ricci’s Lemma. IfV is the Levi-Civita connection for G f G has the 
covariant differential VG = 0. 

Proof By Definitions 6.05 and 6.01, all the r% for V are isometries. But 
this means exactly that (t°) t(G p ) = G q , as expansion of the definitions will 
show. Hence 

(t 2 0 ),(G ? ) - G p = 0 always 

Application of Definitions 7.02 and 7.05 gives the result. □ 

7.07. CoroUary. Covariant differentiation commutes with applications of 
Gj and Gf to u raise and lower indices”. 

Proof. By 7.03 it suffices to work with vector fields. 

Lowering indices; for v G T x Af, 

V(G|t?) = V (C(G <g> v)) 

= C(V(G®t>)) 

= C(VG ®v + G® Vv) 
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that is, 


= C(G® Vt>) since VG = 0 

= (Gj ® Ir*A/)Vt> 

V o Gj = (G| ® It*m) 0 V . 


Raising indices; from the above, 

(Gj ® It*m) ° (V o G x ) o Gf = (G| ® ir*Af) 0 ((C?| ® It*m) ° V) o Gj 
((G| ® It*m) oV)oG|oG| = (Gj ® It*m) ° ( Gf | ® It*m) °(Vo Gj ) 
(G| ® It*m) o V = V o Gf 


7.08. Components. If in coordinates ti = and in E has compo¬ 
nents , then we define the components of Vin by the equation 

V u w = ® • • • ® di k ® cfar 71 ® • • • ® dx * k ) . 

We leave it to the reader (Exercise 2a) to prove from Theorem 7.03 and the 
formulae of 3.02 that for these components 

♦ «U.- *<«{: .“< £-**_.,*._**. 


/=! 


/=! 


(The trick is first to extend 3.02, to get V^da:**, by Exercise lb. Then extend 
to general w by 7.03 E.) 

Notice that, if for some t] we have u = d^, it has components 6* (since 

Sydi — drj). 

V dn w = V,to = ® • • • ® dxih ) 

= ® •••® d * ik ). 

giving an alternative definition of 

The generalised Leibniz rule 7.03 E becomes in coordinates 


by * 


\ V ji—jh W bi...b m )-„ V ji-jh;ri W bi...b m + V ]i.Jh W b 1 ...bm' i rj 


by plugging V.1.12 into the definition. 

Note that some books use the notation for our w}J *”}*;,, • 

We introduce here and use subsequently an abbreviation common in 
the literature: denote ;f? by wj*; ^ (Notice the vital difference 

between comma and semi-colon: one means a derivative of a component, 
the other a component of a derivative.) The computation rule * above then 
becomes 

= “fc'j + J unk 35 before • 
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We also abbreviate and («£;/.}* to and 

w h"7n,nit' res P ectivel y- 

Notice that if w is just a function it;: M —* R, 


w \n - w ,v - d n( w ) • 


In components, 7.07 takes the form, for instance, 


f w k — w ik 

w y t i) — w \i) 


tm _ 


9ijU>s t ,r, 


= 


Hence we can ignore the presence of semi-colons (but not commas) when 
raising and lowering indices. Equally usefully; 


7.09. Lemma. The covariant differential of the constant ({)-tensor field I 
on M (cf. VII.3.05 iii) is 0. 

Proof. Clearly parallel transport takes the identity on any tangent space to 
that on any other (just apply the definitions) and the result follows. □ 

The utility of 7.09 is its coordinate form: 

6 ),v = Q 

This is often expressed by the statement that <5j, like gij and g'* is “a constant 
with respect to covariant differentiation”. It has the consequence, frequently 
invaluable in manipulations, that “change of indices”, such as using Si-v* = v\ 
commutes like raising and lowering indices with covariant differentiation. For 
example, 

v ;rj = (6j vJ )it? = fyvfrj (using the Leibniz rule), 

which can also be seen by considering the component functions directly. No¬ 
tice that change of indices is essentially contraction of I ®v. 

7.10. Definition. A tensor field t on M is constant relative to a connection 
V or metric tensor field G if the corresponding parallel transport along any 
curve takes its value at any point to its value at any other: if its covariant 
differential is 0 identically. 

Lemmas 7.06, 7.09 state that a metric is constant relative to itself and 
that the identity is constant relative to any connection. The other constant 
fields we have encountered so far are constant functions and parallel vector 
fields. Evidently, any multiple of a constant field by a scalar constant, or more 
generally a tensor product of constant fields, is again constant by 7.03 E. 

Notice that the constant fields on affine spaces of VII.3.05 need not 
necessarily be constant relative to metrics that are not constant in the sense 
used there. 
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Exercises VIII. 7 

1. a) From 7.03 C, show that to prove E it suffices to consider v = /v', 

w = gw 1 with each of v', w* a parallel field along c, and /,</: J —► R. 
(Hint pick bases and parallel-transport them.) 

Establish the equation 

~i~(o) v p ® = (*fo(°K) ® GK 0 K) + (/(°K) ® (J(°K) 

by bilinearity and Exercise VII.4.4, and deduce E. 

b) Prove from A, F, E that if / € T^Af, w ETqM then 

(V„/) w = u(f(wj) + /(V«») . 

c) Deduce G when v is the covariant vector field /. 

d) Use E to prove G for tensors of the form x\ ® • • • ® where each sc,- 
is in 7^M or T^Af, and C to extend it to general tensors. 

2. a) Prove the equation * of 7.08. 

b) Use 7.08 and 3.03 to show that the n h + k + l functions obey 

the transformation rule (cf. VII.4.04) for a (£ +1 )-tensor field. 

(The work involved will show you why “covariance” with respect to 
these rules was such a triumph in the original approach.) 

c) Use 7.08 and 6.06 to prove Ricci’s Lemma (7.06) in its coordinate form 

9ij ’,rj = 0 , g — 0 

and derive its corollary, again using 7.08. (There are some places 
where coordinates give the quickest and most convenient proof. On 
the other hand, there are places where geometry not only gives more 
insight but is much quicker.) 
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“The voice of him that crieth in the wilderness, 

Prepare ye the way of the Lord, 

make straight in the desert a highway for our God.” 

Isaiah 40,3 

The ancient custom in the Eastern Mediterranean of the straight, royal road 
for the exclusive use of the semi-divine ruler (cf. Aristotle telling Alexander 
there was no royal road to geometry - he had to go the same way as everyone 
else) involved a clear, if unformulated, idea of “straight”. With the rigid 
formalisation of geometry into the Euclidean system, “straight” became a 
more restricted notion which clearly would not fit a road that bent over the 
horizon, as a long enough road must. Hence a new word was needed. Earth 
had been considered a perfect sphere since early Greek times, and on such if 
you keep “straight on”, deviating neither to the left nor to the right, for long 
enough you return to your starting point and your starting direction. Your 
path, then, unambigously divides the earth into two parts, to its left and to 
its right: hence the chosen word for such a path was “geodesic” or “divides 
the earth”. This name has become fixed for an undeviating path, though 
only on a perfect sphere does such a path always have this dividing property 
(and the earth is not such thing). 


1. Local Characterisation 

When is a curve “undeviating”? Its direction at c(t) is given by c*(t) E 
so not deviating must mean that c*(f) does not vary with f, in some 
sense. Since we move it from one tangent space to another, it cannot be 
“constant” in the strict sense of that word. The previous chapter, though, was 
almost entirely devoted to the study of what “rate of change along a curve” 
ought to mean for vectors. For a manifold M with a metric tensor G, Levi- 
Civita connection V and associated differentiation V c * along c (VIII.3.05), 
then, the natural definition is 

1.01. Definition. A curve c is a geodesic if V c *c*(f) = 0 Vt; that is, if its 
tangent vector field (VIII.3.04) is parallel. 

If c is thought of as describing the motion of a particle, c*(t) becomes 
“velocity at time f” and V c *c* becomes “rate of change of velocity” or “ac¬ 
celeration” . So the geodesic is the path of a particle “subject to no forces”, 
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Fig* 1*1 


constrained only by the geometry of the manifold. (We give another inter¬ 
pretation in §3.) 

It is clear that this definition depends on G, since V does. This is 
entirely reasonable: Fig. 1.1 illustrated a diffeomorphism between two mani¬ 
folds (which are thus “the same” topologically and differentially) carrying an 
intuitively “undeviating” curve to an obviously “bent” one. 

Since parallel transport is an isometry, parallelism is a somewhat stronger 
condition than simply that c*(t) not be “turning” with respect to the con¬ 
nection: it must stay the same size. This is the most convenient formulation, 
as if we allowed the size to change, c*(t) could go to zero unless we added a 
separate condition to forbid it - and when you have stopped, you no longer 
have a “direction you are going in” to preserve. One consequence of this is 

1.02. Lemma. A geodesic is always a like curve (VII.5.03). □ 

In spacetime, null and timelike geodesics are often called world-lines 
though usually this term is allowed to include other timelike or null curves 
(cf. XI.1.02). 

We already see that geodesics have a more elaborate geometry if M has 
an indefinite metric than in the Riemannian case. Various facts of strictly 
Riemannian geometry fail for indefinite metrics. For example, if M is Rie¬ 
mannian, connected and “geodesically complete” (defined in 2.03), any two 
points in M can be joined by a geodesic, while in the example of §6 this fails 
even for points connected by a timelike curve. We leave such “strictly Rie¬ 
mannian” results to the “strictly pure” mathematics texts. (Even if spacetime 
does have a geodesic between any two points this would be without physi¬ 
cal significance - pending the discovery of tachyons - since if x, y can be 
joined only be a spacelike geodesic, events at either cannot affect events at 
the other.) 

1.03. Closed Geodesics. In Euclidean geometry (or, for that matter Loba- 
chevskian: Exercise 2.3) no straight line “meets itself’ as a great circle does. 
But on the sphere geodesics are “closed curves” in the obvious sense (Exer¬ 
cise 1). This suggests that we define a closed geodesic to be a smooth curve 
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c : R —► M with c* parallel and some 0/i£R such that c(t + k) = c(t), 
Vt 6 R. We can also define a crossed geodesic to be a geodesic c with domain 
either J C R, or S 1 and c(x) = c(y) for some x / y. On a compact Rieman- 
nian 2-manifold, unless there is a good deal of symmetry, a typical geodesic 
will not be closed but will cross itself infinitely many times. In particular on 
the earth the bulges away from sphericity of the “geoid”, which is geodesy’s 
name for whatever shape the ideal “sea-level” surface currently has, mean 
that geodesics which really divide the earth in two are highly unusual. (A 
theorem of global analysis [Liusternik and Schnirelman] says that on any 
manifold homeomorphic to S 2 , given any metric tensor, there must be at 
least three closed geodesics. But it gives no indication how to find them. For 
generalisations of this to n-manifolds, see [Klingenberg].) 

We do not draw pictures to illustrate how irregularities in the geoid cause 
geodesics to deviate, cross, etc.; by §3, you can get a clearer notion than from 
any figure by pulling string tight around a potato. 

1.04. Components. Relative to a chart, a curve c takes the form t i-+ 
(c 1 ^),... ,c n (f)). In the corresponding basis for T C ^X, c*(t) has compo¬ 
nents (^”(*)> • • • > “^”(0)- The geodesic equation, 


V c *(*)C* = 0 , 

thus takes the form (using VIII.3.02) 


(dPc k k dc? dc % \ 

+ J dk = 0 ’ 


fsL + r*—— - n 

ds 2 ij ds ds ~ ’ 


VJb. 


Exercises IX. 1 

1. Use Exercise VIII.6.5 to show that the geodesics on a sphere S n of 
any dimension, with the usual metric tensor, are the great circles. 

2. Compute, and draw, the geodesics given by the asymmetric connection 
on R of Exercise VIII.6.1. (Notice that they are of two kinds - what 
goes up need not come down.) 

3. a) We can define a manifold RP 2 , the real projective plane , as the set of 

unordered pairs {x, —x} where xGS 2 (so that {x, —x} is the same as 
{—x, x}) with charts defined from those of Exercise VII.2.1 by 

Ui = { { X , -x} | * € U i+ } 
x}) = <£,*+ (whichever of x, —x is in [/,-+). 
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(This manifold “ S 2 with opposite points identified”, sits in R 3 
even less comfortably than the Klein bottle, but embeds in R 4 .) 

b) Define a Riemannian metric on RP 2 such that, with the standard 
metric on 5 2 , the derivative of Q : S 2 —► RP : x {z,— x} at any 
point is an isometry. 

c) Show - easier without coordinates - that the geodesics on RP 2 are 
the images by Q of the great circles. Deduce that any two distinct 
geodesics meet at exactly one point. 

(Taking “straight line” to mean “geodesic in RP 2 ” this contradicts 
Euclid’s parallel postulate, which asserts the existence of straight lines 
that never meet. But since all his other axioms are true for such 
“lines”, if the others implied the parallel postulate that also would 
hold for them. Two millenia of attempts to prove the parallel axiom 
from the others hit this and similar rocks last century. 

RP 2 is the standard elliptic non-Euclidean geometry (cf. also Ex¬ 
ercise 2.3).) 

4. Show that in the situation of Exercise VIII.6.8, a curve 
c:J-4MxJV:tM (c M (t),c N (t)) 
is a geodesic if and only if c M , c N are geodesics in AT, N. 


2. Geodesics from a Point 


2.01. The Horizontal Field. From any point p E Af, we would expect to 
be able to go off with any given “starting velocity” vector v p E Tj>Af, and by 
“not deviating” get a well defined geodesic through p, with tangent vector 
Up at 0. The proof of this is a question in differential equations, as follows. 

At each point v E TM there is a unique horizontal vector v E T V (TM) 
with D v II(v) = v, by Exercise VIII.3.6b. (In Fig. 2.1, T P M and v are drawn 
twice, using the embedded picture and the bundle picture.) This gives a 
(clearly smooth) vector field x on TM , with *(u) = u, with the properties 

(i) D w II(x w ) = v <=> w=v. 

(ii) Each is a horizontal vector. 

If c is a solution curve in TM of x (cf. VII.6.01), and c = 77 o c 
is its projection down to Af, then c becomes a vector field along c with 
Dc(t)Il(c*(t)) = c*(t), as in VIII.3.06. Since by assumption c*(t) = x(c(t)), 
by (i) we have c(t) = c*(t ); c is exactly the tangent vector field c* along c. 
But c(t) is horizontal for all t , and examining Definition VIII.3.07 this means 
exactly that 

V c .c* = 0 , 


so c is a geodesic. 
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So, appealing to Theorem VII.6.04 for the existence and uniqueness of 
such a c through any v E TM , we have 

2.02. Theorem. For any given p E Af , v E T P M, there is a geodesic c : 
J M with c(0) = p, c*(0) = v, unique in the sense that any other such 
geodesic f : K M has f\ KnJ = c\ Kry j. □ 

(Notice that the geodesic equation 1.01, 1.04 is a second order ordinary 
differential equation, and that geometrically this means a vector field on TM, 
Similarly a 3rd order equation is a vector field on T(TAf), and so forth.) 

2.03. Geodesic Completeness. Theorem 2.02 is a local fact. Unlike 
VIII.4.02, which says we have parallel transport as far as we like along a given 
path, a geodesic through p cannot necessarily be extended to a geode sic wi th 
domain all of R. (For example if M = R with the metric (/n(x) = 

the only geodesic with c(0) = 0, c*(0) = 2e is t »-+ y—a, by Exercise 4.) If all 
geodesics on M can be extended in this way, M is called geodesically com¬ 
plete. Quite mild conditions (a strong one is compactness, but others work) 
guarantee that a Riemannian manifold is geodesically complete (cf. 3.10 and 
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see [Kobayashi and Nomizu]), but recent work (see [Hawking and Ellis]) has 
shown that physically reasonable assumptions make it impossible for a space- 
time to be complete with reference to the interesting geodesics: the timelike 
and null ones. (In fact the situation is worse. For there are spacetimes 
that are geodesically complete but which have incomplete timelike curves of 
bounded acceleration, so particles or people in them may vanish. In conse¬ 
quence, a new idea of completeness has been devised in terms of a certain 
bundle over the spacetime.) An obstruction to extending geodesics forward 
in time may be a black hole or collapse (local or global); a barrier backwards, 
a bang or (in theories in which all geodesics extended backwards meet the 
same singularity) a Big Bang , (cf. XII.2.04). 

Locally however we do have geodesics in all directions from a point, and 
with them we construct a special map that has various useful features. The 
idea is to carry a tangent vector v £ T P M to the effect of “travelling unit 
time” by the geodesic with initial vector v. (Thus 0 will go to p, 2v will 
go “twice as far” as v, etc.) If M is not geodesically complete, the geodesic 
through v may not be extendable to a domain that contains 1, but a small 
enough v starts us travelling sufficiently slowly to meet no obstruction before 
unit time is up. (How small is “small enough” will generally vary from one 
point in M to another.) 

2.04. Definition. The exponential map from a subset E C T p M to M is 
defined as follows. 


E = { v J 3 a geodesic c v s.t. c v (0) = p, c* (0) = v, & c v (l) is defined } 
exp p : E —► M : w »-► <^(1) . 

By Exercise 1, E contains an open neighbourhood of 0 £ T P M. (In fact 
E is itself open and so a neighbourhood of 0, but the proof is somewhat 
technical and we shall not need it.) 

The map exp p is well defined on E by the uniqueness property in 2.02. 
(The name “exponential” is due to a special case. Using the usual metric on 
S 1 , considered as the set { z | \z\ = 1} of complex numbers, and the obvious 
parameter on the tangent space at 1, exp x is given exactly by x h e ,a? , 
(Fig. 2.2). A more elaborate example, not involving complex notation, is 
discussed in §6.) 

If M is geodesically complete, of course, exp p is defined on all of T P M; 
if any p,q £ M can be joined by a geodesic, exp p is surjective. Thus it is not 
in general injective, as a typical manifold is not smoothly bijective with any 
vector space. Fig. 2.3 illustrates exp( N Pole ) on S 2 with its usual metric, for 
those vectors in .p Q i e )S 2 of length < tt (what happens to the longer ones?) 
The images of the straight lines shown in the tangent space are the curves 
shown on S 2 : notice that though by Exercise If the straight lines through 0 
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are carried to geodesics, no other straight lines in T(N.p 0 i e )S 2 are - Indeed, if 
all others were then the sphere would be “flat” in the sense we discuss next 
chapter. 

The exponential map is not, then, a diffeomorphism. (The study of its 
singularities , the places where not even D m (exp p ) is injective, is an active 
topic in differential geometry.) But it is always smooth (Exercise 2), and 
near 0 even better: 
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2.05. Lemma. 0 £ T p M has a neighbourhood U C T P M such that exp p \u 
is a diffeomorphism. 

Proof. By Exercise If, c : t exp p (tv) has c(0) = p, c*(p) = v. But evidently 
c : t —► tv represents v (considered as a tangent vector to T P M at 0 in the 
natural way), so c = expoc represents Doexp p (v). Thus v = c*(0) is in the 
image of Do exp p . Since this holds for any v, Do exp p is a surjective linear 
map and so, since dim(To(T p M)) = dimT p Af, an isomorphism by 1.2.13. 

The result follows by VII. 1.04. □ 

2.06. Normal Coordinates. By 2.05 we have an open set V = exp p (J7) C 
M and a C°° map expp" : B —► T P M . The pair (V,expp") constitutes an 
admissible chart (VII.2.01), using the canonical affine structure of T P M. This 
chart played a crucial role in the early development of differential geometry, 
because of the simplicity it can bring to a wilderness of coordinates. Choosing 
a basis f) = {6i,... ,6 n } for T P M , orthonormal with respect to G p , we get 
an isomorphism B : T P M —► R n , and a chart B o exp* - with domain V and 
range R n . Such a chart is called a system of normal coordinates about p, 
with respect to G. (Notice that we have as many as there are choices of 
orthonormal basis for T p M) It is very convenient in computations - if used 
with care - because: 

2.07. Lemma. With respect to a system of normal coordinates about p £ M, 

A ) 9ij(p) = ±6ij, Vi,j 

B) r? j (p) = 0,Vi,j,k 

C) dk9ij{p) = 0, Vi, k 
Proof. 

A) 

9ij — di(p) ■ dj(p) by definition 

= c* (0) • Cj (0) , where 

C{(t) = expp(B ,- (0,... ,0, t,0,... ,0)), by Exercise VUI.l.la 

T 

i-th place 

= bi bj since c,(t) = exp(<&,•) so that 

c* (0) = 6j by Exercise If 
= since (3 was chosen orthonormal. 

B) Va t v = V c r v, for any field v, since c,*(<) = (5,) C| ( ( ). Hence = 

0 Vi, t since c,- is a geodesic. Similarly, 

V(d,+dj)(di + dj) = V c r +> (d,- + dj) , 
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along c i+j :t >-* exp p (B i ~(Q,... ,t,... ,t,... ,G) 

T t 

i-th and j'-th places 

= 0 

Hence by Ci), Cii) of VIII.3.01, 

0 = +fy) 

= Vdi( P )di + V di (p)ft + Vdi( P )dj + Vdj( P )dj 

= 0 + V dj(p) di + V^(p)9 ; * + 0 

= 2 Vdi( P )dj by symmetry of V, since [di,dj] = 0. 

Thus all the vectors V d>(p)dj at Py and hence their components the /-, vanish. 

C) is an immediate consequence of B) and the last equation of VII.6.06, 
since all the ly* vanish when all the / ^ do. □ 

Notice that the above simplifications occur only at p; in general gij and 
rj*j will be more awkward at all other points. (The rjj just pass through 0 
at p, for this chart.) The proof in part B that = 0 is valid only along 
the z*-axis, since lines in R n parallel to the x’-axis need not correspond to 
geodesics (as we noticed for S 2 ). Hence the proof that rjj = 0, which needs 
both rfi = 0 and rfo = 0, applies only to points on both the x x and the 
a^-axis: that is only at p. We study the question of when the g can be 
made constant (or the zero) in a whole chart, next chapter. 

It is not unknown for a course in an otherwise respectable physics de¬ 
partment to “prove” things on the assumption that the r can be vanished, 
by a suitable choice of coordinates, over a small region at a time. This error 
rests above all on the habit of leaving off the argument of a function, so that 
distinction is lost between the numbers /$(p), which we can make zero by a 
suitable choice of chart, and the functions rjj which we usually can’t. (The 
distinction is often lost even between the words “number” and “function”, as 
when a “vector” is introduced as a set of “numbers” that transform nicely 
when the equations given concern vector fields and their coordinate func¬ 
tions. Choice of normal coordinates is one context where this sin against the 
light leads not just to confusion but to serious error.) 


Exercises IX.2 

1. a) Use Theorem VII.6.04 to show that there is a neighbourhood U of 
0 P e TM and a local flow U x ]-£, e[ TM for the vector field x of 
2 . 01 . 

b) Deduce that for the neighbourhood V = U HT p M of the zero of T p M 
there is a smooth map <t> : V x ]-£,£[ —► M such that for v E V 9 
<f> v : 1 1 -* </)(v , t) is a geodesic with $£( 0 ) = v. 
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C) 

d) 


e ) 

f) 

2 . 


3. 


a) 


b) 


c ) 


Show that if c : ]—a,a[ — ► M is a geodesic and a G R, then the curve 
c a : ]—■J, —► M : t *-► c(at) is also a geodesic, with c*(0) = a(c*(0)). 
Deduce that if W = { | v E V}, then W is an open neighbour¬ 

hood of 0 P in T P M with a smooth map ifr : W x ]—2,2[ —► M such 
that for w E W, is a geodesic with ^(0) = w. 

Deduce that the map exp p : w is well defined on W. 

Show that the map t !-► exp p (fw) is exactly ip w . 

Deduce from the smoothness (by VII.6.04) of the geodesic flow that 
whenever exp p is defined in an open neighbourhood U of any tangent 
vector, it is C°° in U. 

Let Pa be the manifold { (z, y) | y > 0 } C R 2 with the obvious chart, 
and the metric tensor field 




u 2 + v 2 
V 2 


(Our notation Pa is taken from the standard name, Poincare upper 
half-plane , for this manifold with this metric.) 

Show that 



Show that the image of any geodesic is confined to some set { (x, y) | 
(x - a) 2 + y 2 = r 2 }, a,r G R, or {(x,y) | x = a }, and that Pa is 
geodesically complete (Fig. 2.4). 

Prove that for any point p and geodesic c not through p, with do¬ 
main R, there are infinitely many geodesics with domain R that fail 
to meet c. 

(This breaks Euclid’s parallel postulate in the opposite way to 
that in Exercise 1.3: instead of having no “lines” through p not meet¬ 
ing c, or exactly one as Euclid proposed, there are more than one. Pa 



Fig. 2,4 
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is an example of a hyperbolic or Lobachevskian non-Euclidean geome- 
try.) 

4. Show that the example given in 2.03 is indeed a geodesic, and that any 
other with c(0) = 0, c*(0) = 2e is a restriction of it to a subinterval 
of ]-l,l[. 

3. Global Characterisation 

“Straight” above was interpreted as the opposite of “bent”: c is a geodesic if 
at each t its direction c*(t) is unchanging, by the measure for rate of change 
along curves developed in the last chapter. Now for a long road on the earth 
this local point of view is natural, but for the top of a wall we have another 
test - we compare it to a stretched string. 

Why is a stretched string straight? It has some give - perfectly inelastic 
strings occur only in Applied Maths exams - so there are other positions it 
could occupy. Disturbing it into them, however, takes effort and, disturbing 
force removed, it will relax back into straightness because (neglecting gravity) 
this is its position of least energy. Now being of least energy is a property of 
the position as a whole, a global property, in contrast to the local condition 
1.01, and here it is clearly the physically decisive property. We can generalise 
this global approach to straightness from the Euclidean to the Riemannian 
and pseudo-Riemannian contexts, and show the result equivalent to the local 
one. 

Start by fixing p,q € M and considering paths c : [a, 6] —► M with 
c(a) = p, c(b) = q. If M is embedded in Euclidean space, and each c rep¬ 
resents a possible way to lie in M for a piece of elastic of length 6 — a, we 
have a clear intuitive idea of “position of last energy”. Notice that this in¬ 
cludes being evenly stretched: push the midpoint of the elastic along the set 
of points occupied and it will slide back fast, when released, to its previous 
position, even though the elastic has been moved to a position of the same 
length. 

We make one simplification: real elastic, if p and q are too close, has 
many positions of zero energy and will rest in any of them. So a section 6 
long when relaxed, short enough that we may make the linear approximation 
of supposing it evenly stretched in Euclidean space to a length v, will have 
tension proportional to and energy to when v > 6, 0 when v < S. 

This is simplified if we imagine “ideal elastic” for which unstretched length 
is always negligible compared to stretched length, and so suppose the energy 
of the piece is Completing the process of linear approximation around 
a point by taking the limit as 6 —► 0 (going to the derivative), this suggests 
that we adopt c*(t) as the “tension” vector and ^(length of c*(t)) 2 as the 
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“energy per unit unstretched length” at c(t). The (length) 2 of a vector v is 
given by v • v, so we are led to make 

3.01. Definition. The energy of a smooth curve c : [a, 6] —► M when M has 
metric tensor field G is the quantity 

b b 

E(c) = 1 J G c (,)(c*(s),c*(s)) ds , or j c*(s) c*(s)ds for short. 

a a 

Similar remarks to those about the length integral (VII.5.05; cf. also 
Exercise 1) apply to the existence and meaning of E(c): our motivation 
above of the definition again appeals to the idea of an integral as a kind of 
total, rather than an anti-derivative. 

Warning: only in the Riemannian situation, as above, does E bear any 
relation whatever to anything else called “energy” in physics, and our main 
interest is not in this case. But the discussion above explains the standard 
use of “energy” as its name, and the equally standard factor | which has his¬ 
torically stuck to it despite being quite irrelevant in the analysis of geodesics. 

Let us then start looking for c with E(c) minimal. (Not exactly what 
we’ll find but, as the elastic example suggests, this is a good place to start.) 
Now, in the case of functions / : R —► R, the first move in finding minima 
is to find those x where jfc(x) = 0, since as we vary t through a minimum 
/ can be neither increasing or decreasing, so ^(x) can be neither strictly 
positive nor negative (Exercise VII.5.lb). Essentially the same idea applies 
here: we look for the curves c such that varying through them, in whatever 
way, involves at c a zero rate of change of E. Of course, there are infinitely 
many independent ways to vary c, but fortunately we do not here need the 
theory of infinite-dimensional manifolds of maps, and of the derivatives of 
functions on them. We just consider the “directional derivatives” of E at c. 
That is, we require the vanishing of the rate of change of E at c, as we change 
path smoothly through c in any particular way. To do this formally we first 
need: 

3.02. Definition. A smooth variation of a curve c : [a, 6] —► M from p to q 
is a smooth map V :]—£,£[ x [a, 6] —► M with the properties 

i) V(t,a) = p, V(t, b) = q, Vt G ]-£,e[ 

ii) V(0,s) = c(s), Vs e [a,fc]. 

We can think of this as a family of paths Vt : [a, 6] —► M : s i-+ V(f,s) 
for t € ]—e,e[ with each V% having the same end-points as c, and Vq actually 
coinciding with c. The formulation in terms of one map V makes the idea of 
smooth family of curves more precise: it is possible for each V t to be smooth, 
and each V s : t V(t, s) across c to be smooth, without V as a whole being 
smooth (Exercise 2). 
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We want to examine the behaviour of E(Vt) as t varies through 0. Since 
this is just a real-valued function of f, we know just what we mean by a 
derivative of it. We are looking for curves c such that for any variation of c, 
this derivative is zero. To carry out the necessary calculations neatly we need 
a bit more language. We have moved from curves, with domain an interval 
in R, to maps with their domains in R 2 . We give these a special name and 
make an analogy with Definition VII.3.04 (though “along” looks a little odd 
in this context): 

3.03. Definition. A parametrised surface in a manifold M is a smooth map 
S from a product of intervals (open or closed) / x J C R 2 to M. (So that a 
smooth variation of a curve is a highly special parametrised surface). 

Just as a smooth curve S is allowed to cross itself, have zero derivative 
etc., so a parametrized surface need not sit very neatly in M. 

A vector field along S is a smooth map v : I x J —► TM such that 
II o v = 5. 

Analogously to the tangent vector field along a curve, if for a point 
(x,y) G / x J we define ci(t) = (x +1, y ), c 2 (s) = (x, y + s) we can set 

S*i(x,y) = (jod)*(0) , S3(*,y) = (Soc 2 )*(0) . 

Doing this for each (x,y) £ I x J gives us vector fields 5jf, SJ along 5. 

As with fields along curves, we can use the connection on M to define 
covariant differentiation “along the surface” of a vector field u along S that 
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need not be the restriction w o S of a vector field w on M. We need in 
particular two partial differentials of u along S : 

A x u , where A x u(x , t/) = V 5 J (ti o ci)(0), along S o a : t h* S(x + f , y), 

and 

Ziyti, where A y u(x,y) = Vsj(u o C 2 )( 0 ), along 5 o C 2 . 

(When we are using other labels such as (t,s) or (x*,x 2 ) for points in / x J, 
we rename these fields accordingly.) 

For the particular case of S being the variation V, we see V% (t , $) in the 
“ideal elastic” conception as the tension at V(t, s ) of a piece P of elastic whose 
position is given by the curve V t . Thinking of V t as “position of P at time t” 
and so of V(t,s) as “position of a point $ of P at time f” leads to seeing 
Vi(t,s) as “velocity of the point s at time t”. The vector field s i-» Vj*(0,s) 
along c is called a variation vector field along c (cf. Exercise 6 a). Also, we 
shall need the vector field A S V J along V , whose value at (t } s) gives “the 
variation (considering size and direction), along Vt, through Vt(s), of tension 
in elastic with the position V*”- If is the limit as 6 —+ 0 of the difference in 
tension forces at the front and back ends for a piece of elastic, 6 long and 
centered on s, per unit length. Hence, this field may be thought of as giving 
the instantaneous force on each point s of P, at each time t. 

(For curves in spacetime, of course, the above motivations do not hold, 
but the geometry does. We have already encountered the “elastic force” 
vectors as “acceleration” vectors when s is thought of as time, not position 
on the elastic: a curve is a more general object than anything it represents.) 
For manipulative purposes we need 

3.04. Lemma. If the connection on M is symmetric , then 

A 9 Si(t,8) = A t Sl(t,s) , V(t,s) 

along any parametrised surface S : / x J —► M. 

Proof If we had set up the language of connections along general maps (Ex¬ 
ercise 3 ) this would be a corollary of the fact that one symmetric connection 
induces another. As it is, the quickest proof is to write the statement down in 
coordinates - since it is purely local we lose nothing by working in a chart - 
and compute (Exercise 4a). 

(N.B. This calculation is more than a check on the Lemma’s assertion: it is 
a good check on your grasp of the various objects involved, and well worth 
doing. Better still, do Exercise 4b.) □ 

3.05. Definition. Writing the energy of the curve Vt : s V(t,s) as Ey(t), 
the first variation of the energy function at a curve c with respect to a 
variation V of c is 
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or, expanding from 3.01 and 3.03 

L 2 ^J t J(V*(,s)-V*(,s))ds 

(cf. VII.1.03 on notation.) 

3.06. Definition. If c has least energy among “nearby” curves from a to 6, 
Ey must have a minimum at 0 for any variation V. Since it is a smooth 
function R —► R, we must therefore have 

GH <°>=° 

for any variation V through c: 0 is a critical point for every Ey . Thus we shall 
look for curves that are energy-critical: that is, those whose first variations 
vanish with respect to all V . Not all such curves are minima, but it turns out 
that “critical” is more important than “minimal” anyway. Our main tool in 
the search is the following 

3.07. Theorem (First Variation Formula). Using the Levi-Civita connection 
on M, 

0 

(^v) (0) = - / AV?(0,«) • V 1 ’(0,s)ds . 

a 

In words for “ideal elastic”, this says that the total rate of change of 
elastic energy at time 0 is the integral over s £ [a, 6] of the dot product 

—(net force on s at time 0) • (velocity of s at time 0) 

which would be trivial if we were summing over a finite set of s’s. (Their 
“kinetic energy” increases at the expense of the “elastic energy” producing 
the force - hence the minus sign.) 

Proof. For any s 6 [a, 6], applying VIII.6.02, 6.05 to the curve V 8 : t h* 
V(t,s), 

* • v;) = a 2 v ; • v? + v; ■ A t v 2 * = iv; ■ a<v 2 * 

as functions ]—£, e[ —► R. Hence 

(jt K ) (0) = 2 (j t j *?( •«) • W •«)*] (0) 
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b 

= \ j >*)' > a ))( Q )j ds . fe y Exercise 5 

a 

b 

= 1 /l^ ^ 0)dS 

a 

b 

= Jv 2 *(0,s) A t V 2 *(0,s)(0)ds by* 

a 

b 

** = J VJ(0,«) • 4,^(0, s)(0)ds , by 3.04. 

a 

By VIII.6.02 (along Vt this time) we have for t € ]-e,e[ 


f s {V 2 *(t, )) = A,V 2 *(t, ) + V 2 *(t, )-A,V 1 *(t, ) 

as functions [a, b] -> R. That is, s V^f,$) • ^(t, s) is an indefinite integral 
for the function on the right, so, setting t = 0 


b 

J (4. v;(o, s) ■ v?(o, s) + v;(o, s) ■ a, v; (o, *)) ds 

= V?(0, b ) • V x *(0, b) - V?(0, a) ■ Vx*(0, a) . 

But Vi(tjd) and Vi*(t,6) are zero for all <, since the variation keeps the end 
points of c fixed. Therefore 16* (0,6) = 0, 16* (0, a) = 0, and so 

b b 

J (V?(0 ,s) ■ A s Vj*(0,s)) ds = -J (4,V?(0,*) • ^(0,s)) ds 

a <* 

which combines with ** to prove the theorem. □ 

Since 16(0,5) is exactly c(s), of course, 3.07 can also be expressed as 

b 

(^ v ) = _ / ( Ve * c *( s ) • w ( s )) ds 


where w is the variation vector field corresponding to V . Intuitively, for 
elastic, a position c will be an equilibrum exactly when the “force” vector 
field V c *c* vanishes. In general, 
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3.08. Corollary. A curve c is energy critical if and only if it is a geodesic . 

Proof If c is a geodesic, for any variation V we have 

b b 

(Ji E ^ (0) = - Jo-V l *(0,s)ds = Jods = 0 

A a 

so c is energy-critical. Conversely, if c is energy critical, 

b 

J (V c *c*(s) -w) ds = 0 

a 

whenever u is a variation vector field; hence by Exercise 6a, whenever u is 
a vector field along c with ti(s) = 0 P , t*(6) = 0 q . But this implies (Exer¬ 
cise 6b-e) that 

V c .c* = 0 

identically, so c is a geodesic. □ 

3.09. Length. If we had set out to generalise the idea of “shortest distance” 
between two points we would have used instead of E the integral 

b b 

L(c) = J \/|c*(s)-c*(s)|ds = J\\c*(s)\\ds 

a a 

for length introduced in VII.5.0 4, an d proceeded similarly. This would have 
had two disadvantages: first y/\ | is not differentiable at 0, which means 

harder technicalities to handle; second, there are more length-critical curves 
than energy-critical, in an unhelpful way. Consider elastic stretched in Eu¬ 
clidean space: there is one position only which is minimal (or even critical) 
for energy, but we need only pull the middle along to find infinitely many 
other positions achieving the same minimal length. (It is also much clearer, 
for our motivation, why elastic should “want” to minimise energy, as distinct 
from length.) However, a non-null curve of critical length once found, it is 
always possible to rearrange it “evenly” along its image and get a curve of 
critical energy. More precisely, we state without proof: 

3.10. Fact. A non-null length-critical curve is always a reparametrisation of 
an energy-critical one: that is, of a geodesic. (On null curves, cf. 4.02). 

Thus we would find no more interesting curves in M by considering 
length than by using energy, while working harder to find them. Moreover 
we would have missed the unique canonical parametrisation of those we did 
find, which comes almost free with the energy approach (Exercise 7). The 
interested reader is referred to [Spivak(2)J, Vol. I, for a proof, though he will 
have to extend the arguments slightly for the non-Riemannian case. In the 
Riemannian case infimum of arc length between two points in a connected 
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manifold actually gives a (nontensorial; VI. 1.02) metric, and the metric space 
is complete (in the sense given in Appendix, 1.02) if and only if the manifold 
is geodesically complete: see [Kobayashi and Nomizu]. 

Exercises IX.3 

1. Show from Definitions VII.5.04, 5.05 and Definition 3.01 that the en¬ 
ergy of an affine curve, in an affine space with a constant metric tensor, 
is proportional to the square of its length. 

2. a) Show that the function / of Exercise VII. 1.2 has all functions f x : 

R —► R : t i—► f(x,t) and f y :t\-+ /(t,y) smooth, though it has no 
derivative at (0,0). 

b) Deduce that V = /|]-i,i[ x [—2,2] does not constitute a smooth vari¬ 
ation of the curve [—2,2] —► R : t »—► 0, though each V% and V 8 is 
smooth. 

3. The extension in 3.03 of Definition VIII.3.04 should have stimulated 
the reader’s generalisation reflexes: 

What is the appropriate definition for a vector field along any 
smooth map / : M —► N? (It should reduce to an earlier definition 
when / is the identity M —► M.) 

Define a connection along f : M —+ N . (At gives a connection 
along a curve, for instance, and A x , A y give a connection along 5, 
but in a coordinate-dependent way.) 

If V is a connection on V, define the induced connection along 
/ :M-+N . 

4. a) Prove Lemma 3.04 in components. (You will need the fact that = 

(Exercise VII.7.1), which is why we do not allow V of Exercise 
2b as a variation or parametrised surface.) 

b) Or, follow the direction pointed by Exercise 3 far enough to get 
Lemma 3.04 as part of a more general theory. 

5. Prove from VII.5.05 and Exercise VII.7.1c that if / : R 2 —► R is C l 
then 

0 0 

a a 

6. a) If u is any vector field along c : [a, 6] —* M from p to q with w(a) = 0 p , 

tt(6) = 0„ construct a variation V through c such that Vi(0, s) = ti(s). 
(One method: use exp c ( a ) to define V(f,s).) 

b) If w along c is not identically 0, use continuity to show there is 
x E [a,6], 6 € R with B(x i 6) C [a,6], y E B(x>6) => w(y) ^ 0. 

c) Use the smoothness and non-degeneracy of G to construct a vector 
field t along c with t(t) • w(t) > 0, t E B(x 1 6). (If M is Riemannian, 
t = w will do.) 
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d) Show that the function 

/ : R —*• R : s i-»- / if |«| < 1 

l 0 otherwise 

is smooth and deduce that 

u:[a,b]^TM:t»f(t^J t(t) 

is a smooth vector field along c with 

w(x) ■ u(x ) > 0 

w(t) ■ u(x) >0, Vt £ [a, 6] 

«(a) = 0 P , u(b) = 0„ . 

e) Deduce that if 

t 

J w(s) • u(s) = 0 

a 

for all variation vector fields u, then w is identically zero. 

7. a) Show that for a geodesic c : [a, 6] —► M and diffeomorphism / : 
M “+ [ML the reparametrised curve c o / is a geodesic if and 
only if / is affine. Deduce that no reparametrisation of c by a map 
9 • [<*, b] -► [a, 6] is a geodesic unless / is the identity. (Thus we have 
a unique canonical parametrisation of c with domain [a, 6].) 

b) If c : [a, 6] —> M is an arbitrary reparametrisation of a non-null 
geodesic c, show that the reparametrisation c of c by arc length is 
a geodesic. Deduce that c is the unique affine reparametrisation of c 
with domain [0 ,L(c)\. 

c) Why does b) fail for c null? And why does the condition that parallel 
transport r be an isometry guarantee that a null vector w is carried 
to a specific r( w) (rather than, say, 2r(tn) which has the same size), 
and hence that a) does not fail for c null? 


4. Maxima, Minima, Uniqueness 

Elastic will sit, stably, only in a position that is a minimum for energy at least 
locally (that is, among nearby paths). Energy and length are both critical, 
but not minimal, for the great circle route from Greenwich (England) to Tema 
(on the coast of Ghana) via both Poles. By suitable small changes of the 
curve we can either diminish or increase its energy, so that even locally this 
geodesic is neither a maximum nor a minimum for energy among curves from 
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Fig. 4.1 


Greenwich to Tema. It is called a saddle point for energy, by analogy with the 
picture of the simplest situation where this sort of non-extremal criticality can 
arise. Fig. 4.1 shows the graph in R 3 of the function R 2 —► R : (x, y) i-* x 2 —y 2 , 
which has neither a maximum nor a minimum at (0,0) but has zero derivative 
there. 

We could investigate systematically whether an energy-critical curve was 
minimal, saddle-type or what, by looking at the second variation, which 
corresponds to differentiating a function R —► R a second time. However 
unless E(c) means physical energy it is usually important that a curve be 
critical while wholly irrelevant physically whether it be minimal; likewise 
when the interesting integral is action, or time. Why nature should behave so 
was a great mystery in classical mechanics. Even a “least whatever” principle 
seemed a bit mystical and mediaeval in flavour, when a particle had no way of 
comparing the integral along its actual history with other possibilities. The 
“critical whatever” conditions actually apparent in, for example, Fermat’s 
Principle (often misstated as “a light ray follows the path of least time”: easy 
critical but not locally minimal examples are given in [Poston and Stewart]) 
could not even be seen as a kind of Divine economy drive. In quantum 
theory, however, variation principles are entirely reasonable: “the particle 
goes all possible ways and probably arrives by a route that delivers it in 
phase with the result of nearby routes”, and this turns out to involve the 
criticality condition directly, with no reference to minima. This rationale 
then motivates variational techniques in classical mechanics, considered as 
an approximation to quantum descriptions. This is not the book in which to 
go further into this point, however, particularly as it is so lucidly discussed 
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in [Feynman] - a work which the reader should in any case read, mark, learn 
and inwardly digest. 

We shall not, then, set up the machinery of the second variation. But it 
is worth looking at some particular facts which help geometrical insight into 
geodesics. 

First, we have seen that a Riemannian geodesic need not be minimal. If 
it is minimal, it need not be the unique such: consider the circle’s worth of 
geodesics from N.Pole to S.Pole on a sphere, all of the least possible length 
and energy. 

Next - since all along we have assumed in illustration that an affine curve 
in Euclidean space is the unique minimum energy curve with that domain 
and end points - let us prove it. 

4.01. Lemma. Given points p, q in an affine space X with vector space T 
and any constant Riemannian metric G, the unique affine map f : [a, 6] -h► X 
with f(a) = p, f(b) = q has less energy than any other curve c : [a, b] —* X 
from p to q. 

Proof It follows at once from the definitions (Exercise 2) and 3.08 that no 
other curve can be critical or, therefore, minimal, but we have still to show 
that every other curve has more energy. (These are not equivalent: we can 
find differentiable curves / : [— 1,1] —► R from —1 to 1 with 

QU) = /(«’-1) ! ((/(<)) ! -1)’ ((f (•)) 2 +1) d» 

arbitrary small, but not zero. (Why? Interpret Q(f) as the energy of 
the curve t (t,f(t)) with respect to a sometimes vanishing “metric ten¬ 
sor” on R 2 .) So although there is a Q-critical function, there is no / with 
Q(f ) < Q(g ), V</. Energy is better-behaved if M is complete and Riemannian 
(cf. 3.10) - there is then always at least one path of least energy between any 
two points - but we have not proved this.) 

Define ci, : [a, 6] —► T, by setting d(p, q) = v and 

C 1(*) = v ' ? 2(0 = d(p, c(t )) = Ci(t) . 

The curve c\ is “the part along v of c”, with ci(a) = 0, ci(6) = v, and 
“the part orthogonal to v” (Fig. 4.2). 

It is clear that (using the corresponding Riemannian metric on T ) 

E(c) = E(c x ) + E(c 2 ) 
since, at each point in the integration, 
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c*(s) • c*(s) = (C! + c 2 )*(s) • (Cl + c 2 )*(s) 

= (nO + nO)-(nO + nO), 

freeing the tangent vectors cj to T 
= cj(s) • cj(s) + cj(s) • c^s) , 

as c*, are tangent to orthogonal subspaces and so themselves orthogonal. 

Hence, since in the Riemannian situation E is always positive and only 
vanishes on constant curves, 

E(c) > E{c x ) 


with equality only when C 2 is identically 0. So if the image of c is not confined 
to the affine hull V of {p,q} (cf. II. 1.03), then we can reduce its energy by 
orthogonal projection onto V. It therefore suffices to consider c of the form 
c(t) = p + c(t)v , where c : [a, b] —► R has c(a) = 0, c(6) = 1. We then have 


no = ^(0®. whiie no = 


(freeing vectors where convenient), and 


* 




x V V 

5 (a -6)2 



J V • V 

* a — 6 


So if c(t) = d(/(t),c(t)), we have 


0 

e( C ) = \ j (no - no) • (no - r (0) * 


(freeing the tangent vectors) 


00 0 

= 2 f no • no - J (^(0®) • ds +2 j no • no 
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= m -r z a}{ 

a 

= E(c)-j-^(c(b)-c(a))+E(f) 

= E(e) - E(f) by *. 

S ° E(e) > E(f) , 

with equality only when E(c) = 0, that is when c is constant: precisely when 
c = /, since c(a) = /(a). □ 

4.02. Other Cases. A similar proof to 4.01 (Exercise 3) shows the affine 
path from p to g, in a Riemannian X to have the shortest possible length , 
sharing this length only with its reparametrisations. A minor adaption (Exer¬ 
cise 4) of the same technique shows that if exactly one vector in an orthonor¬ 
mal basis for T is timelike, a timelike geodesic has maximum length among 
timelike curves from p to q measured in the corresponding constant metric, 
though not maximum energy. If there is more than one spacelike dimension, 
spacelike geodesics are neither maximal nor minimal. Fig. 4.3 shows such a 
geodesic / from p to q , together with spacelike curves ci, c 2 (with length and 
energy closer to 0 than those of /, and further, respectively). 

Defining the length L(c) of c in terms of || || rather than | | makes 
it automatically real and non-negative, so that any null curve is trivially 
minimal, geodesic or not. With respect to energy, a null geodesic / is a 
saddle point. For we can take a variation V of f that changes only the 
parametrisation. Thus each V* is a null curve, so Ey is identically zero. 
But since only one parametrisation with the given domain is energy-critical, 
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Fig, 4,4 


this means that arbitrarily near / there are non-critical curves with the same 
(zero) energy. But this is only possible in the saddle situation (Fig. 4.4; there 
must be points arbitrarily near x where h takes values either above or below 
h{x).) 

4.03. Riemannian Geodesics. We have seen that a geodesic on the sphere 
need not be - even among nearby geodesics - a minimal curve between its 
ends. Intuitively the failure of the Polar route from Greenwich to Tema to 
be a minimum stems from taking too large a piece of a great circle. In fact, 
any sufficiently small piece of a geodesic in a Riemannian manifold will be a 
minimum (Exercise 5). 

One way to develop intuition about Riemannian geodesics, clearly, is 
to tighten elastic strings in various ways around potatoes, husbands, and 
any other objects with interesting surfaces. Another, allowing 3-dimensional 
manifolds not just surfaces, is to consider systems of lenses and refractive 
materials generally. 

To keep things C °°, assume that the refractive index k(x) at a point x 
in the interesting space X gives a smooth function X —► R. (Assuming that 
a lens “fades out” through a thin boundary layer violates reality no worse 
than supposing that k has a discontinuity through a C°° surface defining a 
boundary of zero thickness, and in this context saves technicalities.) Now, 
in classical optics, ignoring polarisation, light travels through X with speed 
choosing units as in Chap. 0.§3 to make its speed 1 in vacuum. If 
however we define a new Riemannian metric from the usual constant one G 
on X (supposed affine) by 
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G{x ) = k 2 (x) G(x) 

according to G a path / of a light ray has constant speed 

/*(*))]* = *(*)[usual speed] = 1 , 

independently of t. Moreover Fermat’s principle (light rays take paths of 
critical time) becomes the statement that / has critical length as measured 
by G: time taken is given exactly by L(f). So / is a parametrisation of a 
geodesic, and in fact is a geodesic since it has constant speed. So by alter¬ 
ing the geometry, we have rescued the principle that light travels in (now 
generalised) straight lines even when not in a medium of constant refrac¬ 
tive index, and incorporated the physics into the geometry (cf. also Exer¬ 
cise 6). 

This is closely analogous to general relativity’s changing Newton’s “par¬ 
ticles follow straight lines at constant speed in the absence of gravitational 
forces” to “particles follow geodesics in spacetime”, expect that it is avoid¬ 
able. For gravitation, only geometry seems to work. 

4.04. Pseudo-Riemannian Geodesics. In the pseudo-Riemannian case, 
as in the Riemannian, there can be more than one geodesic between two 
points. Indeed, let M = R x S 2 with the metric tensor induced by the 
obvious inclusion R x S 2 «-► R x R 3 = R 4 into R 4 with the Minkowski metric. 
(So that each {t} x 5 2 has all vectors tangent to it spacelike). Exercise 7 
illustrates on M some of the things that are possible; more appear in §5 
and §6. 

Sufficiently small bits of a timelike geodesic, in a manifold with one time¬ 
like dimension, are maximal for length (Exercise 8) though not for energy, 
since that is false even in the affine case (4.02, Exercise 4b). As with mini¬ 
mality in the Riemannian case, this may fail for larger pieces (cf. for example, 
Exercise 7b). Null and spacelike geodesics are always saddle-type criticalities, 
even in small pieces, by the same arguments as before. 

4.05. Twins. The Twins “paradox” (Chap. 0.§3) is not a logical problem. It 
is an experimental fact about measurements of time which is neatly modelled 
(hence “explained”) by pseudo-Riemannian geometry. But since from time 
to time any physicist is trapped by some Philosopher of Science who is proud 
of not understanding equations, he should be equipped with some arguments 
simple enough for the Philosopher to understand. 

The quantity (time 2 — distance 2 ), separating two events, is known ex¬ 
perimentally to be independent of who measures the separate times and dis¬ 
tances and how fast he is going, up to differences in velocity very close to 
the speed of light. (This is contrary to Newton’s theory, since there distance 


7*i mJuc. 7^aiAe##ia£liia L PAyAtcJ. 



4. Maxima, Minima, Uniqueness 


271 


between non-simultaneous events depends on choice of “rest velocity” while 
time does not.) A new physical theory might alter the philosophy of this 
fact as profoundly as relativity alters Newtonian gravitation, but would in¬ 
volve only very small numerical changes if it still agreed with experiment. 
(Just as Newtonian mechanics remains accurate enough for ICBMs.) So the 
consequences of this experimental fact will not alter much numerically for 
situations already studied. 

One of these consequences is that along an affine world-line (supposing 
for the present that spacetime is affine), the length given by the Lorentz 
metric for d(/(6),/(a)) is the appropriate perceived time for an observer 
following / from f(a) to /(6), since for him (distance) 2 between f(a) and 
f(b) is zero. It is then less a postulat e of relativity , special or general, than 
a basic notion of the calculus that y/f*(s) f*(s) is the rate of change of 
perceived or proper 1 time for an observer whose motion is described by any 
timelike curve / - not just an affine curve. Differentiation is linear or affine 
approximation, so if the calculus is applicable we can make an arbitrarily 
good affine descr iption by tak ing a small enough bit. In an affine motion, the 
arguments for \/f*(s) * f*( s ) being the (constant) rate of change of proper 
time with s are overwhelming. 

Equally basic is the notion that, for a quantity changing with s at a 
rate depending on s, you get the total change between 8 = a and s = 6 
by integrating the rate. Hence the appropriate “elapsed proper time” along 
a timelike curve is the integral we have called length. This will remain an 
exceedingly good approximation to observed or predicted lapse of proper 
time, even if relativity is replaced by something else and the same happens 
to the calculus: both fit the facts too well to fail to approximate anything 
that fits better. 

Suppose spacetime is isomorphic to Minkowski space (such an isomor¬ 
phism, preserving the metric, is called an inertial frame and only exists - 
even locally - if spacetime is flat: cf. Chapter X). We see then by 4.02 that 
along any timelike curve c from p to g, that is not a reparametrisation of 
the affine one / parametrised by arc length, proper time measured along c is 
less than that along /. This conclusion does not depend on c being an “in¬ 
ertial movement”, it depends only on the existence of an inertial frame. The 
Philosopher who say “But if one observer is accelerating, his observations are 
no longer referred to an inertial frame and special relativity is inapplicable” 
either has never seen the right pictures drawn to explain the calculus, or has 
not been told that special relativity assumes only that spacetime has the ge¬ 
ometry of Minkowski space (a statement without “observers”) and that the 
calculus is applicable. 


1 “Proper” here does not mean “right”: it is older English usage for “tied to the particular 
person or thing”, as in “property”. 
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General relativity is different in two respects. First, it ceases to suppose 
that spacetime has an affine structure, and makes the weaker assumption 
(justified by local experiments) that it is a Lorentz manifold. Second, it 
relates the metric to the distribution of matter. We shall discuss the second 
point in Chapter XII; for the present discussion, all we need is the first. The 
above reasoning still justifies considering the length integral as elapsed proper 
time (recall that T P M is exactly the flat approximation to the manifold M 
at p, just as D p f : T p —* T^ p )N is the linear approximation at p to the map 
f : M —► N) and the observations of 4.04 apply. 

In this situation two timelike geodesics, not just two curves, from the 
same point in spacetime can meet again, and observers travelling them can 
compare watches. This confuses even some of the Philosophers who have 
got used to the special theory. Before, they could see the difference between 
the geodesic curve and the other, but now with both observers “inertial” 
shouldn’t symmetry guarantee equal elapsed times? 

Not if the matter distribution influencing the metric has any asymmetry 
of its own. Exactly analogously, why should the two geodesics in Exercise 6 
have the same length? Even if the spacetime is highly symmetric and the 
end points symmetrically placed, the geodesics need not be symmetrically 
related (Exercise 7c). 

Enough, or even too much, on points that should be obvious. One final 
remark: certain journals are given to carrying acrimonious disputes about 
this “paradox”, in coordinates yet. To print material at that level is a waste 
of precious trees. 

Exercises IX.4 

1. Find all three minima, and both maxima, of the function / : R — ► R : 
x h-* |x 3 — 6x 2 + llx — 6|. Comment on the relationship of smoothness 
to the rule “differentiate to find the minimum”, and on the relative 
difficulty of the length and energy variational problems. 

2. A geodesic curve in an affine space with a constant metric tensor field 
is necessarily affine (argue geometrically, or use 1.04 and VIII.6.06 for 
components.), and vice versa. 

3. a) Reduce the minimum length problem from p to q , in an affine space X 

with a constant Riemannian metric, to one dimension in the manner 
of 4.01. 

b) Any non-injective curve in R with the usual metric has greater length 
than an injective one with the same end points. 

c) Deduce that any path with a length no greater than ||d(p,g)|| is a 
reparametrisation of the affine curve from p to q with domain [0,1]. 

4. a) If the affine space X has a constant indefinite metric of signature 

(2 — dimX), and d(p , q) is timelike, let / be an affine curve from p to 
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q. Show that any timelike curve from p to q that is not a reparametri- 
sation of / has greater length than /. 
b) Find a variation of / along which Ey attains a maximum at 0 (vary 
the route), and one along which Ey attains a minimum at 0 (vary the 
parametrisation.) 

Thus / is a local maximum for L, a saddle point for E. 

5. a) Let M be Riemannian, and U C T X M , V CM have exp x \u a diffeo- 

morphism U —>V. Find a ball Bs = {t | ||t|| < 6 } C [/, v E T X M 
with ||v|| = 5, and let y = exp JC (v), f(t) : [0,1] —► T X M : t h-* tv. 
Decompose an arbitrary curve c : [0,1] —* T X M from 0 to v into a 
“radial part” c\ : t h-> \\c(t)\\ and a “spherical part” : t »-► to 
show that if the image C of c is contained in t/, but not in Bs then 

E(c) > |5 2 = £?(/) , L(c) > 6 = L(f) 
where c = exp x oc, / = exp^. o/. 

Deduce that this is true even if C ^ (7, and show that for any </ : 
[0,1] —► M from x to j/, 

E(g) > E(f) with equality only if g = / 

L(g) > L(f) with equality only if g = / o h, ft : [0,1] [0,1]. 

b) Deduce that for a geodesic c : [a, b] —► M and any point t E [a, 6] there 
is an e such that whenever s £ ]t,t + e[, c|[ t>4 ] is 

(i) the curve of least energy from c(t) to c(s) with domain [tf, s] 

(ii) shorter than any other curve from c(t) to c(s) that is not a 
reparametrisation of it. 

6. A mirage is due to air near the ground becoming hotter than that 
above, and consequently having a lower density and refractive index. 
Model this mathematically as in 4.03 (using accurate numbers, or 
making them up) to get the two geodesic light rays shown in Fig. 4.5. 
Which of these (if either) is minimal for length and/or energy among 
nearby curves? (See [Poston and Stewart] for more of the geometry of 
mirages.) 
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7. Let M be S 2 x R with the indefinite metric described in 4.04. 

a) Show that any geodesic in M has the form f(s) = (/*(«),/ 2 (*)) G 
S 2 x R, where f 1 is a geodesic in S 2 with its usual Riemannian metric 
and / 2 is an affine map J —► R, and that any curve of this form is a 
geodesic. (Hint: Exercise 1.4) 

b) Show that there are infinitely many distinct spacelike geodesics parar 
metrised by arc length between any two points. 

c) Show that unless (p\p 2 ), (g 1 , q 2 ) G S 2 x R have p 1 equal to or diamet¬ 
rically opposite to g 1 , there are only finitely many timelike geodesics 
parametrised by arc length (if any) between them. If there are several, 
are they of equal lengths? 

d) If p 1 , g 1 are equal or opposite, then to each of the infinite families of 
equal length geodesics from p 1 to q 1 in S 2 there corresponds a family 
of geodesics from p to q in M ; only finitely many of these families are 
timelike. 

8. Let M be an (n + l)-manifold, with metric G of signature (1 - n), and 
pGM, Choose normal coordinates (x°,... ,x n ) around p ( x° being 
timelike) and if y = exp p (v) for v timelike has coordinates (v°,..., v n ), 
define 

= .= . 

where t(v) = G(v,v). 

a) Show that this defines an admissible chart (whose domain U does not 
include p, though its closure does) and that any vector u = v}di at a 
point in f7, 

G(u,u) = (u 0 ) 2 — 7 y«*V 

where the 7 are the components of a negative definite bilinear form. 

b) Deduce that if g G J7, there is a unique geodesic c : [0,1] —► M from p 
to q with image confined to UU{p} and that any timelike curve from p 
to q with image in UU{p} is either shorter than, or a reparametrisation 
of, c. 

c) Compare and contrast this result with Exercise 5. 

9. a) Suppose spacetime X is isomorphic to Minkowski space, and that 

someone has invented a “hyperdrive” by which we can “travel at twice 
the speed of light”: that is, suppose that curves / = (/°, f 1 , / 2 , / 3 ) in 
R 4 are descriptions of possible motions provided that 



always. If any unit timelike vector tangent to X may be chosen as “rest 
velocity” in setting up affine coordinates before applying *, show that 
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for any two points p, q with d(p, q) spacelike the geodesic between 
them satisfies * for a suitable choice of rest velocity. 

b) Suppose that the choice of “rest velocity” is automatically the ve¬ 
locity of your spacecraft at the moment of pressing the “hyperdrive” 
button B. If p is “here and now”, and q is “Alpha Centauri one year 
ago”, to get from p to q how fast and in what direction would you 
initially leave the Solar System before pressing B? Assume Alpha 
Centauri is 4 light years off, for simplicity, and ignore acceleration 
time.) 

Give your answer in “sun is at rest” terms, as we have given q. 

c) Suggest a way to control your “spacelike direction” after pressing B. 

d) Comment on the prevalence of science-fiction stories in which “hyper¬ 
drive”, but not time travel, is involved. How many include an idea 
which prevents uses like the above for the “hyperdrive”? 


5. Geodesics in Embedded Manifolds 


Clearly, a geodesic in a manifold M embedded in an affine space X (like a 
great circle in S 2 C R 3 ) is not in general a geodesic in X\ M forces it to 
bend. But it must bend only as M forces it to: 

5.01. Theorem. If M is embedded by the inclusion i: M <-+ X in an affine 
space with constant metric tensor G, denote covariant differentiation along 
curves in X by Vc*, and along curves in M (with respect to the Levi-Civita 
connection of the metric induced on M) by V c *. Then for c : J —► M we 
have, with toc = c:J— 

V c .c*(<) = 0 


if and only if 

G(Va*c*(*),*) = 0 , Vt? G T c(t) Af. 

Thus c is a geodesic in M if and only if the a net elastic force ” on each point 
(in the “elastic” conception) or the “acceleration vector" at each moment (in 
the “motion of a particle”) is always orthogonal to M. 

Proof Apply Exercise VIII.2.1a and Exercise VIII.6.3b and the relation 
VIII.3.05 between V and V. (The theorem is true for X a general mani¬ 
fold and G arbitrary, as long as we use the metric induced by G on Af, but 
we shall not need this.) □ 

Among the “motion of a particle” examples are the spherical pendulum 
(M is an S 2 in R 3 ) with no external forces, or if c(t) is a position in R 3 
for a classical pair of point masses joined by a light rigid rod of length / it 
corresponds to a point in the 5-dimensional manifold (cf. Exercise VII.2.1b, 
Exercise VII.2.8c) 
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M={(x 1 ,x 2 , x 3 , y 1 , y 2 , y 3 ) I (x 1 - y 1 ) 2 + (x 2 - y 2 ) 2 + (a: 3 - y 3 ) 2 = / 2 } 

C R 3 x R 3 = R 6 , 

and the condition that “no external forces act” is exactly that the derivative 
of c* be always normal to M , in the usual Riemannian metric on R 6 . 

5.01 is the differential justification for the “string-stretching” idea of 
geodesics we have been using, in embedded Riemannian manifolds (the local 
minimisation of length, subject to the constraint of lying in M, being an 
integral one). It is useful equally as a way to build intuition in the pseudo- 
Riemannian case, to which string will not stretch. 

5.02. An Indefinite Metric on Part of the Sphere. Let M = { (r, y, z) E 
R 3 | x 2 +y 2 +z 2 = 1, z 2 < | }, the part of the 2-sphere lying strictly between 
the 45° S and 45° N parallels of latitude. Give R 3 the constant metric G 

(ds) 2 = (dx) 2 + (dy) 2 -(dz) 2 

in “line element” notation, and call R 3 with this metric X. 

Then define a chart : U —► R 2 on M by rf>(p) = (0(p),^(p)) where 0(p) 
means “longitude Northwards” and <j>(p) “longitude Eastward” of p (Fig. 5.1). 
The metric G induced on M is, by Exercise lc, 

(ds) 2 = cos 2 6{d(j>) 2 — cos 2 6(dO) 2 . 

Thus the <f> direction is timelike and 6 spacelike. (Since cos 20 = 0 for 0 = 
±45°, G would become degenerate if we included more of the sphere than 
M. Geometrically, this is because the tangent plane to the sphere at any 


z , (sp a ce ii ke) 



Fig. 5,1 


7^aiAe##ia£liia 



5. Geodesics in Embedded Manifolds 


277 


point with z 2 = is degenerate (recall Fig. VII.3.7), and the metric induced 
on the top and bottom caps (z 2 > |) is Riemannian.) 

What curves in M are geodesics? Clearly by symmetry we have the 
meridians and the equator (suitable parametrised). But the other pieces of 
great circles are not geodesics in this metric. 

Consider a general curve c :t »-* (c*(*),c^(t)) in Af, with c*(0) > 0 and 
c*(0) ^ 0. Without loss of generality suppose c^(0) = so that p = c(0) 
lies in the x-z plane of X , which we may call Q. 

Now from Fig. 5.2a it is clear that the “acceleration vector” a = V c *c*(0) 
(differentiating in X) must be non-zero and point to the left of the plane P in 
X geometrically tangent to M at p, since c “bounces off’ P on that side. If c 
is geodesic, a must lie in the one-dimensional subspace V = ( T P M ) x C T P X. 
By symmetry V is tangent to Q (otherwise reflection in Q would give us 
another V), so as Q has the indefinite metric 

(ds) 2 = ( dx) 2 — ( dz) 2 

we see (recalling IV.1.08) that V is as shown in Fig. 5.2b, in contrast with 
the Riemannian picture 5.2c. 

Thus a is non-zero, points left and is in V. In terms of the x-z plane Q 
and of R 3 , then, c has at 0 an “acceleration” with its z-components upwards. 
If c*(0) is negative, of course, a similar analysis shows that a points down¬ 
wards. Thus the geodesics in M are qualitatively as illustrated in Fig. 5.3. 
(Null vectors, and hence the spacelike ones squeezed between them, tend to 
multiples of d0 as 9 —► ±45°. Thus null and spacelike geodesics tend to 
tangency with meridians as c 9 — ► ±45°. The fact that timelike geodesics do 
likewise can be seen by examining the way that “upward acceleration” goes 
to infinity.) 

Theorem 5.01 thus allows us qualitatively to analyse geodesics without 
hard work on the equations 



(o) (>>) fc> 


Fig. 5.2 
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ds 2 + ds ds~ ’ 

when we can obtain the metric from a convenient embedding. (In the words 
of Dirac “I feel I understand a differential equation when I can see what the 
solutions look like without actually solving it”) 

5.03. An Example on the Plane Minus a Point. Consider the subman¬ 
ifold 

M={(r,M)eR 3 |* = -;-r}, 

using cylindrical coordinates on R 3 . Evidently we may use (r, 0) as coordi¬ 
nates on Af, so that M is an embedding of R 2 \ {(0,0)} by the map 

V>:(r,0)(-»(r,0,--r) . 

r 

If R 3 has its usual Riemannian metric, string-stretching shows that the 
geodesics in Af are as shown in Fig. 5.4a, or, in R 2 \ {(0,0)} with the metric 
induced by xp, as in Fig. 5.4b. Evidently a “free end” of any such geodesic is 
asymptotic to a straight line in R 2 with its usual metric, and M is geodesically 
complete (Exercise 2), cf. 2.03. 

If M has the indefinite metric induced by the metric on R 3 of 5.02, 
a curve tangent to one with c r constant (that is a “static” timelike one) 
has an acceleration a in R 3 that points “upwards”; Fig. 5.5 is analogous to 
fig. 5.2a,b. Such tangency is equivalent to ~(t) = 0, as we see that no 
geodesic has a point where c r is minimal, as with the previous metric: we 
only have maxima. Thus a non-radial geodesic must spiral out from the 
origin and escape (Fig. 5.6a) spiral in from r infinite (b), or (c) spiral out, 
reach a maximum value of c r and spiral back in again (Exercise 3). At such 
a maximum, c* is (trivially) timelike, so only geodesics can look like (c). 
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Fig. 5.6 


Exercises IX.5 

1. a) Let c : J —► R 3 have c(J) C S 2 C R 3 , and set c(t) = (c*(*),c y (t), 
c z (t)) G R 3 , c(t) = ta*(<),c^(f)) G S 2 using the labels discussed in 
5.02 for as much of S 2 as possible. Show that 

dc* t , 6 .d<? . , 6 .dc y 

— = cos(c*)(cos(c*)— + sm(r)— 
as v as as 

dc B . / $\ ( . / / S\d < ?\ / 

•^=^(0 )(- S m(c*)— + co S ( c *)— J + a»(c*)-j 7 

as functions J —► R. 

b) Deduce that using the indefinite metric G given on R 3 in 5.02 



c) Deduce that in “line element” notation the induced tensor field G = 
G| T 2 ( 52 ) on 5 is given by 

(ds) 2 = cos 2 0(d<l>) 2 — cos 20(d0) 2 

and show this is not a metric tensor on jfy^jS 2 if 0 = ±45°. 

d) Describe the geodesics on the upper and lower caps ( z 2 > with this 

induced metric tensor. (Note that (0 — <f>) on a cap essentially gives 

polar coordinates (r, ^), so if you wish to work in coordinates you can 
use the metric as (ds) 2 = cos2r(dr) 2 + sinr(d<^) 2 .) 

2. a) Show that M as in 5.03, with the Riemannian metric given, is geodesi- 
cally complete. (Hint: suppose a geodesic c : ]a, b[ —► M cannot be 
extended past 6. Use the fact that a geodesic in M from p to q is longer 
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than that in R 3 to show by compactness that c(t) exists in R 3 . 

Deduce that lim*-*&(c(t)) exists in M ) and obtain a contradiction.) 

b) In Fig. 5.4b the “ordinary distance” c r (t) of the point c(t) from the ori¬ 
gin has a minimum for each geodesic shown. Can c r have a maximum 
for any geodesic? 

c) Do all geodesics except the “radial” ones ( c d = constant) have minima 
for c r ? If there are others with no minimum, do they go finitely or 
infinitely many times around the origin? 

d) Make precise and prove carefully the statement that the free end(s) 
of a geodesic in R 2 \ {(0,0)} with this metric are asymptotic to affine 
path(s) / :R^R 2 . 

3. In this exercise M is as in 5.03, with the indefinite metric. 

a) Using the coordinates (r, 9) for a point (r, 0, z) E M, show that the 
metric is given by 

b) Find the coordinate equation for a geodesic c :t (c r (t), c*(£)) in M. 

c) Show, with or without (b), that null geodesics spiral infinitely many 
times around the origin (what does a null geodesic look like in the em¬ 
bedded picture?) Deduce that the same is true for timelike geodesics. 
What about spacelike ones? (cf. Exercise 2c) 

d) Find a reparametrisation of ]0, oo[ —> M : t (t ) 0, t~ l —t) that makes 
it a geodesic, and deduce that M is not geodesically complete. 

e) Can a timelike geodesic have a free end, as in Fig. 5.6a,b, or must it 
always be “trapped” as in c? 

6. An Example of Lie Group Geometry 

We have not set up any of the general machinery of Lie group theory. But 
this example does not require that; we can analyse it explicitely with what 
we have. We include it mainly as an illuminating example in our pseudo- 
Riemannian geometry. The fact that it brings out some geometric matter 
often neglected in courses on Lie groups is a bonus for readers already study¬ 
ing them. Others can enjoy the example by thinking of a Lie group as a 
group that is also a smooth manifold with smooth composition. (Bring to¬ 
gether Exercises 1.1.10, VII.2.01 and 2.02) 

Consider the set of unimodular operators on R 2 , that is those with deter¬ 
minant 1. We have established (IV.1.03, Exercise IV.3.6, Exercise VII.2.8c, 
Exercise VIII.1.4c) that this has the structure of a 3-manifold with an in¬ 
definite metric tensor field. Via the polarisation identity (Exercise IV.1.7d), 
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everything involved is defined starting with the function det, which is a “nat¬ 
ural” object definable without coordinates (Exercise V.1.11). In the same 
sense, then, we have found a “natural” structure on the set of unimodular 
operators R 2 —► R 2 , usually called SL(2;R). (This stands for “the Special Lin¬ 
ear group on 2 Real variables.” The corresponding “General” group, GL(2; R) 
mentioned in 1.3.01, consists of all operators with non-zero determinant, that 
is all invertible ones.) Now, as a group, SL(2;R) has a natural multiplicative 
structure - we can compose two unimodular operators and (by 1.3.08) get 
another. To investigate this, let us use the standard basis for R 2 and the 
corresponding matrix labels for operators. The metric tensor we are using 
on L(R 2 ;R 2 ) can be written 


& 

_1 


C3H 

Lc dj 


[r sj 


|(as + dp)- %(br + cq) . 


6.01, Tangent Vectors at the Identity. By Exercise VIII. 1.4 the tangent 
vectors to SL(2; R) at any point (operator) A are exactly those orthogonal 
to A in the determinant metric tensor, transferred to the space of vectors 
tangent to L(R 2 ;R 2 ) at A . Using “matrix” coordinates on Ta(L(R 2 ;R 2 )) 
corresponding to those on L(R 2 ;R 2 ) itself, this means in particular that for 

fa b 


B € Tj(L(R 2 ;R 2 ) with matrix 


we have 


B 6Tj (SL(2;R)) 


a b' 


[i ol 

c d 


t-H 

o 


§(a + d) = 0 


<=> tr B = 0 . 


= 0 


Thus the tangent space at I to SL(2; R) consists of the operators with zero 
trace. Such “traceless” operators on R 2 have a curious property: using matrix 
multiplication (not the metric tensor), 


[ I ?] 2 = 


a 

c 



= (— det J?)[7] . 



a 2 + be 
ca — ac 


ab — ba 
cb + c 2 


= (a 2 + be) 


1 

0 


0 

1 


In consequence, for any integer k, 

B 2k = (- det B) k I , B 2k+1 = (- det B) k B . 


6.02. Power Series of Operators. The usual series defining exp(z), alias 
e r , for a real number x is 


2*2 j»3 x 4 

1 + * + 2! + 3! + 4! + ' 
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There is an obvious analogue for an operator B. Using a capital E to distin¬ 
guish it from the maps defined in 2.04, we set 

Exp(B) = I + B + ± i B i + ±B 3 + j [ B 4 + -.. 

The usual proofs (for which see any elementary analysis text) that the series 
for e x converges absolutely for all x transfer at once to Exp(B). Recall that a 
series x * converges, by definition, if its finite partial sums Sj = x* 
converge as a sequence. The partial sums are all in the finite dimensional 
vector space L(R 2 ;R 2 ), so we can use the usual topology on this and define 
convergence as in VI.2.01. 

Now if B is in Tj(SL(2;R)), by 6.01 we have 


Exp (B) = I + B + L(_ det B)I + L(_ det B)B + L(- det B?I + • • • 

Now, the convergence is absolute so we can rearrange the series and leave the 
sum unaltered: 

Exp(B) = (l - L det B + ^j(det B) 2 -) I 

+ (l —LdetB + L(detfl) 2 —•••)£ . 


The coefficients of I and B are now power series in the ordinary real number 
det B, and can be made to look very familiar. Set d = y/\ detB| = yJ\B • B|, 
the “size” ||B|| of B (IV.1.07) by the metric tensor we are using. (This is 
not the “norm of an operator” used in IV.4.01.) Then if detB is positive it 
is d 2 , and 


Exp(B) = (> - | ^ 



£_ f 
3! + 5! 


•) 


B 


= cos(d)I + sin(d)(-B) . 


If det B = 0, then we have 


Exp(B) = I + B . 

If det B is negative it is (—d 2 ), so that 


Exp(B)_ (l + l + l + ...)/ + I(d + ^ + ^ + ...)B 


= cosh(d)I + sinh(d)(-B) . 


The operator ^B that appears here is just the normalisation (IV. 1.06) of B, 
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the unit vector in the same direction. When B is taken, as in 6.01, as a 
tangent vector at I we shall denote by B the result of normalising B if it is 
non-null and freeing it (shifting it to the origin of L(R 2 ;R 2 )). If det B = 0 
we define B by freeing B without - since we can’t - normalising it. Then as 
a map 7j(SL(2;R)) —► L(R 2 ;R 2 ), Exp takes the form 

{ cos (d)I + sin (d)B , det B > 0 

I+B , det B = 0 

cosh (d)I + smh(d)B , det B < 0. 

6.03. The Geometry of Exp. The image Exp(Tj(SL(2;R)) lies in SL(2; R), 
by Exercise 1. What does the mapping Exp look like? It is convenient to use 
the orthonormal basis 



1 0‘ 


' 0 f 


'1 0 ' 


0 r 

&1 = 

0 1 

> &2 = 

-1 0 

> &3 = 

0 -1 

1 64 = 

1 0 


found in Exercise IV.3.6, giving coordinates (a 1 , a 2 , a 3 , a 4 ) say to an operator 
A . In these coordinates SL(2;R) is the set of A satisfying 

(a l)2 + (a 2)2 _ (fl 3 )2 _ (a 4 )2 = j f 

a sort of 3-dimensional “hyperboloid” in R 4 . We can only draw slices of this; 
for instance, fixing a 4 = 0 gives Fig. 6.1. This picture turns out to be fairly 
adequate for our present purposes. 


a* 



Fig. 6.1 
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JtBluua} 

Fig. 6,2 


Suppose B is a tangent vector at J, not just to SL(2; R) but to the a 4 = 0 
slice of it. Then by 6.02 Exp(JB) is a linear combination of B itself, shifted 
to the origin, and I. Hence it lies in the same slice. 

Consider B with det B = 1, a unit timelike vector in Tj (SL(2; R)). Then 
for any tB ^ 0, the normalised vector tB is either +J3 or — B, according to 
the sign of t. So for such a 2?, 

Exp (tB) = cos(f)I + sin (t)B . 

The line {tB | t 6 R } is thus wrapped around an ellipse as in Fig. 6.2a. 
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Since every timelike vector is tB for some unit timelike B, this describes Exp 
on such vectors completely. 

When det B = 0, B lies in the intersection of SL(2; R) and its geometric 
tangent space. In the slice shown in Figs. 6.1, 6.2 it is a pair of lines. (Recall 
that such a hyperboloid contains two straight lines through each point - a 
fact useful in making string models of it but, contrary to many schoolbooks, 
nothing to do with cooling tower design.) For SL(2;R) proper, the inter¬ 
section is a cone (a union of straight lines) - the null cone of the tangent 
space, symmetrical around the vector B of Fig. 6.1. (The tangent space has 
orthonormal basis the bound vectors dj ( 62 )) orthogonal to 

dj"(l) = dj”(&i), of which only the first is timelike. So with this basis the 
“timelike axis”, vertical in Fig. IV.1.5 for H 3 , is parallel to the <12 axis.) The 
effect of freeing B from I to give B, then adding that to J, is to leave the 
vector tip where it started (Fig. 6.2b). The line {tB | / E R}, thought 
of as part of the geometric tangent space, is mapped to SL(2;R) by simple 
inclusion. 

Finally, supposed detB = — 1. Then 

Exp(tB) = cosh (t)I + sinh(t)B , 

so { tB 1 1 € R } is mapped to a hyperbola as in Fig. 6 . 2 c. 

6.04. Relation to expj. The three different formulae we have used in study¬ 
ing Exp do fit neatly together. To demonstrate this we could appeal to more 
theorems on convergent power series (again generalised from real numbers to 
operators) or examine carefully the behaviour of Exp(B) as det B tends to 0. 
But we need not. This map coincides exactly with the differential-geometric 
map expj from a tangent space to a manifold defined in 2.04 (using the met¬ 
ric tensor we have chosen), so that smoothness follows by Exercise 2.2. To 
establish this agreement, we need only prove it on rays {tB | t E R }, since 
every point is on some ray. Thus it suffices to prove that the curves 

E b : R SL(2; R ):tn Exp(tB) 
studied in 6.03 are geodesics, and that the map 

B h* E* B ( 0) 

given by differentiating them at 0 is the identity. (We have affine coordinates 
on L(R 2 ;R 2 ), so the differentiation of curves in it is simple.) 

For the case det B = 0, both statements are trivial. 

In the case det B = d 2 , 

Es(t) = cos (td)I + sin(td)^B 
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£#(/) = -dsin(fd)I + cos (td)B 
E* b ( 0) = B . 

In the case det B = —d 2 , 

EB{t) = cosh(td) + sinh(td)^:B 

d 

E* B (t) = dsinh(td) + cosh(fd)2? 

E* b ( 0) = B . 

It remains to check that each Eb is a geodesic. Differentiating again, 
we have “acceleration vectors” as follows: 

d 

— E* B (t) = —d 2 cos (dt)I - dsin(dt)B = -d 2 (EB(t)) , if det B > 0. 

—Egit) = d 2 cosh(dt)I + dsinh(dt)B = d 2 (EB(t)) , if det B < 0. 

But ±E B (t) is always orthogonal to the tangent plane at E B (t ), by Exercise 
VIII. 1.4, hence in each case Eb is a geodesic by Theorem 5.01. 

6.05. Other Geodesics. Our “positive definite” visual habits and Fig. 6.1 
may not suggest it, but SL(2; R) is quite as symmetrical as a sphere, in the 
metric tensor we are using. (Indeed, its definition {A|A-A=l}is just 
like that of a sphere.) Multiplication Mj± by any unimodular operator A is 
an orthogonal operator on £(R 2 ;R 2 ) by Exercise IV.2.3, since for any B 

(.M a (B )) • (M a (B)) = (AB) • (AB) = det (AB) = det A det B = detB 

= B B 

so it maps SL(2;R) isometrically to itself, carrying geodesics to geodesics 
(and its inverse carries them back). Thus since it takes I to A, we find all 
the geodesics through A (Figs. 6.3b,c) by applying M A to those through I 
(Fig. 6.3a) which we have found already. 

6.06. Gaps. The trace function on L(R 2 ;R 2 ) is as “natural” as the determi¬ 
nant (and as this example illustrates, is closely related to it in general). Since 
it respects addition, and for B E T/(SL(2;R)) we have tr(B) = 0, 6.02 gives 

{ 2cos(d) , B timelike 
2 , B null 

2 cosh(d) , B spacelike. 

Thus for all B we have tr(Exp(JB)) > —2, with equality only if y/det B 
is real and an odd multiple of 7r, giving Exp(2?) = -I. (This is clearly 
analogous to the situation of Fig. 2.3; on the unit sphere it is exactly the 
sets { * | x • x = (2k - l) 2 7r 2 }, for each k E N, that exp p maps to the point 
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Fig. 6.3 


opposite p. But in that case the sets so defined are spheres. Here they are 
not compact) 

So the trace slices SL(2; R) up (Fig. 6.4) into the Exp images of spacelike, 
null and timelike vectors. No point on a null or spacelike geodesic through 
—I (except —I itself) is reached by a geodesic from J, though SL(2;R) is 
clearly geodesically complete, cf. 2.03. 

However such a point A can easily be reached from I by a timelike curve. 
Indeed, it can be reached by a timelike “broken geodesic” with a single jump 
in direction, say at J. So if “physical effects” are considered to propagate 
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Fig. 6.4 



Fig. 6,5 


along timelike geodesics, this means that events at I can have effects at A if 
some “interaction” happens at J. 

6.07. The Covering Space. If we were seriously considering SL(2;R) as a 
model for physical spacetime, our previous sentence would lead at once to a 
genuine logical difficulty (unlike 4.05). If the effects of an event can prop¬ 
agate into that event’s own past, we have the Time Travel Paradox: what 
is to stop a man murdering his mother at a time before his own concep¬ 
tion? (This is stricter than the usual, Oedipal formulation. It’s a wise child 
that knows its own father.) Some modern physical theories grapple with this 
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Fig. 6.6 


problem, but most simply forbid it. The geometrical analogue, not para¬ 
doxical until physically interpreted in this particular way, is the existence 
of closed timelike curves like those in SL(2;R). (Other physical applications 
of pseudo-Riemannian geometry, such as to electrical circuit theory [Smale], 
do not associate paradoxes with such curves.) It happens that SL(2;R) has 
a close relative without this feature, whose construction we may sketch as 
follows. 

Replace (Fig. 6.6) the rectangular coordinates (a 1 , a 2 , a?, a 4 ) we have 
been using on L(R 2 ;R 2 ) by “cylindrical coordinates” (r,0,a 3 ,a 4 ) defined at 
all points except where a 1 = a 2 = 0. This is not strictly a chart on 

M = Z>(R 2 ;R 2 ) \ { (0,0, a 3 , a 4 ) | a 3 ,a 4 G R } , 

since 6 takes values in the circle 5 1 of possible angles, with 0 and 2ir identified. 
It gives a map 

M —* R x S 1 x R 2 , not M —► R 4 . 

But since SL(2;R) lies in M, this “pseudochart” lets us treat it as a sub¬ 
manifold of R x 5 1 x R 2 . Using the exponential map exp x : R —► S 1 : 
x »-* (angle of e tx ) of Fig. 2.2, we can define 

P : R 4 —► R x S 1 x R 2 : (w, ar, y, z) ( w , exp 1 (x) f y, z) 
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which is obviously a local diffeomorphism (VII.2.02). This gives 

SL(2;R) = P* - (SL(2; R)) 

as a sub-3-manifold of R 4 , and 

p : SL(2; R) —► SL(2; R) 

defined by restricting P is again a local diffeomorphism. We then “lift” the 
metric tensor field on SL(2; R) to one on SL(2; R). If v, w are tangent vectors 
in T r (SL(2;R)) we define their dot product by ( D x p(v )) • (D x p(v)). The 
result is non-degenerate, and has the same signature as that on SL(2;R), 
since D x p is an isomorphism for each x E SL(2;R). 

The effect is to “unwind” SL(2;R) as R “unwinds” the circle (Fig. 6.7a). 
The space SL(2;R) may be thought of as a “spiral copy” (Fig. 6.7b) of 



(c) 

Fig. 6.7 
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SL(2;R), as long as it is understood that it does not “spiral in” or “spi¬ 
ral out”. Each small piece has exactly the geometry of a corresponding piece 
of SL(2;R). 

This is an ad hoc construction of the universal cover of SL(2;R). A 
systematical construction, for the universal cover of a general space, a very 
useful tool, is given injnost topology and Lie group texts. 

Unlike SL(2;R), SL(2;R) has an atlas consisting of a single chart using 
all of R 3 . The slice corresponding to Fig. 6.1 can be charted with R 2 , and 
Fig. 6.7c uses such a chart to draw the geodesics through a particular point 
J with p(l ) — I. 

6.08. Aside on Lie Group Theory. The group structure on SL(2; R) can 
also be “lifted”, to make SL(2;R) a Lie group with identity J. It is then 
simple to prove that if A € SL(2;R) does not lie in the image of SL(2;R) , s 
exponential map (Fig. 6.7c), then nor does A k for any integer k ^ 0. This 
illustrated a geometric difference between algebraic groups, those defined by 
an algebraic matrix equation as SL(2;R) is by det A = 1, and Lie groups in 
general. By a recent theorem [Markus], in an algebraic group every A has 
some power A k in the image of Exp (a result Markus was led to by questions 
in differential equations). Thus unlike SL(2;R), SL(2;R) cannot turn up as 
an algebraic group - a result which happens to be weaker than the fact that 
SL(2; R) cannot turn up as a group of n x n matrices at all It has no “faithful 
finite-dimensional representations”, even non-algebraic ones. 

Our definition of the pseudo-Riemannian structure on SL(2;R) and 
SL(2;R) does not generalise, since det is only quadratic on n x n matri¬ 
ces when n = 2. But it is true for Lie groups in general that the Exp defined 
by power series can be realised as a “geodesic” exponential map with a suit¬ 
able metric tensor field. For some groups this is Riemannian (for instance 
Spin(3), the unit quaternion group, is topologically the 3-sphere S 3 and its 
exponential map is the analogue of Fig. 2.3). But since Exp is always defined 
on the whole tangent space at I (the Lie algebra) of the group, the group 
must become geodesically complete. In a connected, geodesically complete 
Riemannian manifold any two points are joined by a geodesic (see for in¬ 
stance [Spivak] for a proof), so no Riemannian structure fits this and similar 
examples. 

6.09. “Relativistic SHM”. This remark is not physics (not that of our uni¬ 
verse, anyway) and not mathematics (“observer” etc. being physical notions). 
We make it, briefly and loosely, as exercise for the imagination. 

The space SL(2; R) is wrong-dimensional to satisfy Einstein’s equation, 
and the 4-dimensional analogue we could get by starting with 

{ * € R 5 | (x 1 ) 2 + (x 2 ) 2 - (x 3 ) 2 - (x 4 ) 2 - (x 5 ) 2 = 1} 
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would been “negative energy density everywhere” (points the reader may 
elaborate after digesting Chap. XII). But if we interpret some particular 
geodesic c as “the history of an observer Q”, and another as “the history of 
a particle P watched by Q”, Fig. 6.7c show that Q “sees P go to and fro, 
returning to him at times tt apart as measured along c”. The periodicity is 
independent of the velocity that Q imputes to P at their meetings. 

This behaviour of particles is what a physicist at the centre of a linear, 
Newtonian inward field of force would see. But here, every world-line shows 
“Simple Harmonic Motion” relative to every other, with the same apparent 
period. Every physicist is equally “central”. 

6.10. Effects on R 2 . Exercises 4-6 analyse the nature of { Exp (tB) | t E R } 
as a family of operators on R 2 . According as B is timelike, null or spacelike 
the flows 

R 2 x R —► R 2 : (x,i) i—► (Exp(tB))* 

look in suitable coordinates (x,y) like Fig. 6.8a, b or c respectively. 

Each is a family of “rotations” with respect to some “metric tensor” 
(degenerate in the case det B = 0), which is unique up to a scalar factor. In 
the original coordinates on R 2 , of course, they may look as in Fig. 6.9. It 
may be seen how, as det B tends to 0 from either side, the geometry of the 
flow tends to the degenerate case. The non-surjectivity of Exp now appears 
more geometrically natural. If tr A = —2, A = -Exp(B) for some null B 
and cannot be reached from I by such a family unless it is —I, as it switches 
the sides of the fixed line L in Fig. 6.9b and reverses it. If tr A < —2, similar 
remarks apply, in terms of Fig. 6.9c. 



Fig. 6.9 
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Fig. 6.10 


6.11. Aside on Crystal Symmetries. A crystal has at most the symme¬ 
tries of a lattice of dots like those in Fig. 6.10a, b, and their 3-dimensional 
analogues. (It may have fewer as in c.) Choose one dot as origin, and a basis 
like the pairs of vectors shown in 6.10. Then any linear operator carrying 
dots to dots must have a matrix with only integer entries, since its columns 
give the (integer) coordinates of lattice points. Hence irrespective of basis, 
tr A is an integer. 

A crystal symmetry A in two (three) dimensions must obviously preserve 
area (volume), at least up to sign. Suppose A : R 2 —► R 2 has det A = 1, and 
is a rotation with respect to some inner product on R 2 . Exercises 4-6 show 
that if A ^ ±1, then — 2 < tr A < 2. Thus if tr A is an integer, it must be 0 
or ±1 (Fig. 6.6a). Hence (Exercise 7), by the unique appropriate metric it is 
a turn through ^, *| or . Nothing like a turn is possible. 



Fig. 6.11 
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Hence, even in three dimensions, by Exercise 8 and crystal symmetry 
A (even with det A < 0) keeping some point fixed and preserving Euclidean 
lengths must have (at least) one of A 2 , A 3 , A 4 , A 6 equal to I. 

The operators in SL(2;R) with |tr A| > 2 (Fig. 6.11b) have been called 
relativistic crystal symmetries [Ascher and Janner]. Crystallographic symme¬ 
tries in Euclidean n-space, even including translations, can be systematically 
enumerated [Schwarzenberger(2)]. But not even the crystallographic “point 
groups” (symmetries keeping a given point fixed) in 4 dimensions have been 
classified for Minkowski space. 


Exercises IX.6 

All operators in these exercises are on R 2 , unless otherwise indicated. 

1. Show by writing out the matrices that for all B E Tj(SL(2;R)), we 
have det(Exp(B)) = 1. (Use 6.02 and the identities cos 2 x + sin 2 x = 
1 = cosh 2 x — sinh 2 x.) 

2. Either 

Show from 6.02 and the standard identities 

sin(/ + s) = sin(t) cos(s) + cos(t) sin(s) 
cos (t + s) = cos (t) cos (s) — sin(<) sin(s) 
sinh(f + s) = sinh(t) cosh(s) + cosh (t) sinh(s) 
cosh(t + s) = cosh(t) cosh(s) + sinh(t) sinh(s) 

that exp(tfB) has the 1 -parameter subgroup property 

* Exp((t + s)B) = (Exp(tJB)) o (exp(sJB)) 

or 

If you already know * in greater generality from Lie group theory, 
derive the above formulae from it. 

3. Show that every A E SL(2;R) with tr A > —2 is Exp(B) for some B. 

4. a) If trB = 0, detB = 1, then by 6.01 B 2 = — J. Deduce that if x is 

any non-zero vector in R 2 , x and Bx are linearly independent. (Hint: 
if v = ax + bBx — 0, then Bv = 0 also.) 
b) Using coordinates (x x ,x 2 ) referred to the basis {x,Bx}, show that 
any symmetric bilinear form G on R 2 such that 

G(Bx,By) = G(x,y) Vz,yE R 2 , 

can be written (for some g E R), as 

G((x 1 ,a: 2 ),(j/ 1 ,y 2 )) = g(x l y l + x 2 y 2 ) . 


~PuJts±. %o£Acjho£uui L PAyAtcJ. 



296 


IX. Geodesics 


c) Deduce that if A = Exp(B) for some timelike B, and A ^ ±1, the 
same holds for symmetric bilinear forms G with 


G(Ax, Ay) = G(x,y) , V*,y E R 2 . 

d) Show that if detB = d 2 , d 6 R, then coordinates on R 2 referred to a 
basis aj, ^Bjb give Exp(tB) the matrix 

cos (td) sin(td) 

— sin (td) cos(td) 


5. 


a) 


b) 


Suppose | tr A\ > 2, det A = 1. 

Show using the characteristic equation of A (cf. 1.3.13) that A has 
two distinct real eigenvalues, and that choosing basis vectors 6 i, 62 


belonging to them gives A the matrix 


0 

AJ 


, some A ^ 0,±1. 


Deduce that in 61 , 62 coordinates, any non-zero symmetric bilinear 
form G on R 2 such that 


G(Ax,Ay) = G(x,y) 

has the formula 

G((x 1 ,x 2 ),(y 1 ,y 2 ))=g(x 1 y 2 + x 2 y 1 ) , 


Vs,y 6 R 2 


some <7 0 . 


Show that G is then a metric tensor on R 2 with signature 0, and that 
a x = + &2)> a 2 = ” ^ 2 ) is an orthonormal basis for it. 

r 1+A 2 1—A 2 " 

c) Show that in cti, a 2 coordinates A has the matrix ^ 1—^2 j ^2 

for some unique t E R. 


and that this equals ± 


cosh t sinh t 
sinh t cosh t 

d) Deduce that A = ±Exp(tfB), where in 01 , a<i coordinates [B] 
’0 1 " 


1 0 


6 . a) Show, similarly to Exercise 5, that any A E SL(2; R) with tr A = ±2 
preserves exactly one (degenerate) symmetric bilinear form G on R 2 , 
up to a scalar factor. Prove that in suitable coordinates on R 2 


=y(*V) , 


[A] = ± 


1 t 
0 1 


for some (/,<GR. 

b) Deduce that A = ±Exp(tB), where [B] = 
dinates. 


in the same coor- 
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7. a) Deduce from 6.06 that if tr(Exp(B)) = 1,0, or —1, then 

[ cos (f) 1 ±_ sin (!) B . 

Exp(B) = < ±B _ , or 

[ cos (*y) I ± sin (^) B respectively. 

b) Deduce using Exercise 3, Exercise 4 that if det A = 1 and trA = 
1, 0, —1, then with respect to some inner product (unique up to a 
scalar) A is a turn through f, f or ^ respectively. 

8. a) Show that an orthogonal operator A on R 3 with its usual inner product 



±i 

0 

o' 


r 

L ' 

has in some coordinates the matrix 

0 

a 

b 

, where 

a 

0 


0 

c 

d 

c 

d 


the matrix of a plane rotation. 

b) Deduce from Exercise 7 that if trA is an integer, I = A 2 , A 3 , A 4 
or A 6 . 


For those with a bit of Lie group theory 

9. a) Find a metric tensor field on GL + (2; R) = { A : R 2 —+ R 2 | det A > 0 } 
such that expj coincides with Exp. (Hint: show that 

GL + (2;R) -+ SL(2;R) xR:Ah ((det-A)"* A,log(det A)) , 

with the additive structure on R, is a Lie group isomorphism.) 
b) Is only one signature possible for such a metric tensor, up to sign? 
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PANORAMIX: Alors, Obelix, l’Helvetie, c’est comment? 
OBELIX: Plat. 

Asterix Chez Les Helvetes 


1. Flat Spaces 

In treating the geometry of manifolds that were not simply nice flat affine 
spaces we have paid major attention to parallel transport along curves; the 
feature of general spaces that distinguishes them most dramatically is the 
disappearance of “absolute” parallelism. This prompts 

1 . 01 . Definition. A connection V on a manifold is locally flat (or “M with 
V” is), if any p E M has a neighbourhood U p such that for q E {/ p , parallel 
transport gives the same result along any curve in U p from p to q. If M has a 
metric tensor, M is locally flat if the corresponding Levi-Civita connection is. 

M is globally flat if parallel transport between any p, q E M is the same 
for any curve in M from p to q . Fig. 1.1 illustrates a locally but not globally 
flat M. Parallel transport along any curve confined to U\ or to U 2 gives 
the same result as along another confined to the same region. However, 
circumnavigating M gives a result different from the identity that is parallel 
transport along a curve that stays at one point. (Examples without “edges” 
are harder to draw, since none embeds in Euclidean 3-space with a locally 
flat metric.) 

If M is simply connected (that is any curve from p to q can be deformed, 
via curves from p to < 7 , into any other) local flatness implies global. This 
is an example of the wide field of relations between the possible metrics/ 
connections/curvatures on a manifold and its topological “shape”. Prac- 
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tically all these relations involve algebraic topology. This handles “india- 
rubber geometry” rather as linear algebra handles another kind, but is a lot 
more complicated and a lot less complete. We shall illustrate the relationship 
further (1.04) by quoting again one of the few results of algebraic topology 
whose statement, at least, does not require machinery that would require 
another book to explain adequately. 

1.02. Parallel Fields and Flatness. On a general M, no non-zero parallel 
vector fields (defined in VIII.4.01) exist; or there may be only a few. For 
example on the Klein bottle with an appropriate metric ( not that induced by 
the position in R 3 shown, for which no non-zero field is parallel) the vector 
field shown in Fig. 1.2 is parallel. But for no metric does this manifold admit 
more than one parallel vector field (why?) up to scalar multiplication, and 
for most it admits none. 



Fig. 1.2 


In general: 

1.03. Lemma. The set of all parallel vector fields on a manifold M with 
connection forms a vector space PM of dimension < dim AT, with equality if 
and only if M is globally flat. (Similarly for open subsets of M, such as the 
domain of a chart.) 

Proof. By the linearity of covariant differentiation, sums and scalar multiples 
of parallel fields are again parallel. Since clearly any parallel vector field is 
specified by its value at any p E M, 

p : PM —► T P M : v i—► v p — v(p) 

is thus linear and injective, so that dim(PAf) < dim (T P M) = dim M. 
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If equality holds, p is an isomorphism and hence any v p £ T p M is part 
of a unique parallel vector field v . Thus v restricted to any curve c from p to 
q is parallel, hence the result of parallel transport to q is v(g), independently 
of c, so M is globally flat. 

Conversely, if M is globally flat we can extend any v p £ T P M to a 
parallel field v : q v q by parallel transport along arbitrary curves, smooth 
by Exercise 2, so p is an isomorphism and equality holds. □ 

1.04. Corollary. The two-dimensional sphere S 2 does not admit any metric 
which makes it globally flat 

Proof. By the Hairy Ball Theorem (cf. VII.4) any vector field is zero some¬ 
where on the sphere. Hence a parallel one is zero everywhere, so dim(P£ 2 ) = 
0^2 = dim(S 2 ) for any metric. □ 

Since S 2 is simply connected this means that it cannot be locally flat 
either; we shall not go into details, but the reader acquainted with the fun¬ 
damental group can readily supply them. (The reader who is not can gain 
insight from considering how to construct an ad hoc proof from the material 
of this chapter.) The sphere can of course be flat somewhere - Fig. IX. 1.1 
shows two embeddings each with three flat regions - but not around every 
point. 

We shall not be using this result, but it illustrates well the constraints 
the global topology of a manifold can put on its local structure. (A similar use 
of the Hairy Ball Theorem, incidentally, shows that S 2 admits no indefinite 
metric tensor field.) We encounter implications of local structure for global 
topology in XII.2.03. When a local structure is interpreted as the presence of 
hydrogen atoms or physicists, the algebraic topology needed to discuss such 
relationships exactly becomes another mathematical theory the cosmologist 
should be at home in. 

The reader may have felt unhappy about Fig. 1.1 being described as flat, 
even locally. We are concerned, though, with intrinsic geometry; if we cut 
it we could spread it flat without wrinkles. Its geodesics would then appear 
as straight lines, the angle-sum of any triangle left intact by the cut would 
be 180°, and so forth: its local geometry is just that of the plane. This is in 
violent contrast to the sphere, no part of which can be matched to flat paper 
without distorting one or the other - as cartographers and anyone who has 
watched a shop-assistent gift-wrap a beachball have particularly good reason 
to know. 

Our next results show that Definition 1.01 amounts to requiring M to 
be exactly like an affine space, as far as local internal measurements are 
concerned. From inside M , you can’t say flatter than that. 

1.05. Lemma. M with metric G is locally flat if and only if around every 
p£ M there is a chart <f\U-+X such that G\u is given by </> as a constant 
metric tensor in the affine sense (VII.3.05). 
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Proof. If around every point there is such a chart, parallel transport within 
its domain corresponds to the usual parallelism in X ) which is independent 
of curves (Exercise VIII.6.4), so M is locally flat. 

If M is locally flat, let V be an open region around p E M in which 
parallel transport is independent of curves. Choose a basis (iii) p ,.. •, (t*n) P 
for T P M. Since V with G\v is globally flat we can extend the basis vectors 
to parallel vector fields tii,..., ti„ on V. By the symmetry of the Levi-Civita 
connection, 

0 = T(ui,Uj) = V u .Uj — V Ui u, — [uijUj] , Vi,j. 

But since tii, ti 2 are parallel, their covariant derivatives with respect to all 
vectors, and hence to each other, vanish. 

Thus 

o = [«i,«j], Vi,;. 

And so by VII.7.04 there is a chart <f> : U -*■ R", U C V, around p whose <9; 
are exactly the «j. With respect to <f>, then, for any q G U we have 


9ij(q) = G q (di,dj) = G 9 (ui(q),Uj(q)) 

where r is parallel transport along any curve from p to q, 
= G p (ui(p),Uj(p)) since V is compatible with G 
= 9ij(p) ■ 

Thus G is given by <f> as the metric tensor on R n with constant coefficients 
which is itself constant. □ 

Notice that both defining characteristics of the Levi-Civita connection 
are involved in this proof. 

1.06. Corollary. M is locally flat if and only if around every point there is 
a chart <j> : U —► X, such that c : J —► M with c(J) C U is a geodesic if and 
only if <j>oc is affine. 

Proof. If M is locally flat, the geodesics are thus by 1.05 and Exercise IX.4.2. 

Conversely, if U is such a chart, without loss of generality suppose X = 
R n , and use coordinates. At any (a? 1 ,..., x n ) the vector di is represented by 
the affine curve c : 1 1 -» (x 1 ,..., x % + 1, ..., a? n ). By hypotheses c is geodesic. 
It follows that 

0 = V c .c* = V$.di , so rl = 0 , Vi, k. 

Similarly t ( x 1 ,..., x i +1 ,..., x j +1 ,..., x n ) represents ft + ft, whence 

0 = ^(di+d^idi + ft) 
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= Va.d,- + 2Vdidj + Va y 5j by linearity and symmetry 

= 2 v ai dj = 2/* a* . 

So 1 

o = r*. V.*, j, & 

everywhere (not just at one point only as in IX.2.06). 

dkiau) = r ijk + r jki ( viii . 6 . 06 ) 

= 0 Vi,;,* 

everywhere; the functions jy are therefore constant, and so therefore is the 
metric on R n that they define. 

Thus by 1.05 M is locally flat. □ 

These results show at once that all the discussion in IX of energy and 
length in affine spaces applies locally in a locally flat manifold, confirming 
that local measurements within it give results exactly like affine geometry. 
(Globally the geometry may differ, however, even on a globally flat manifold: 
Exercise 4a,b.) 

1.06 can be refined slightly for a spacetime, where not all kinds of 
geodesic can currently be related to measurements (nothing yet having been 
shown to go faster than light). The above proof nowhere required the di or- 
thonormal, and we can always choose a basis &i,... , b n for the vector space 
of X such that all the 6 ,* and 6 ,- + bj are timelike (as in Minkowski space: 
(1,0,0,0), (2,1,0,0), (2,0,1,0) and (2,0,0,1)). Hence the proof gives also 

1.07. Corollary. A pseudo-Riemannian manifold is locally flat if and only 
if around every point there is a chart by which timelike geodesics correspond 
to timelike affine curves . 0 

1.08. Curved Round What? At least he does not call it a “paradox”: 
but the kind of Philosopher of IX.4.05 is often sure that space cannot be 
“curved”, which he equates with “bent” (not with “not flat” in our sense of 
flatness), without a higher space providing directions to bend in. This can 
lead to remarkable ideas! The two chief points to hang on to, talking to him, 
are that flatness may well demand more dimensions than curvature when 
it comes to embeddings (Exercise 5) and that curvature in a geometry may 
arise physically in ways very different from bending (Exercise 6 ). 


Exercises X.l 

1. Is there a meaningful definition of a locally parallel vector field, as 
distinct from one parallel over its entire domain? 

2 . Show that the map M —► TM : q \-+ v q defined in 1.03 on a globally 
flat manifold is C°°. (Use Exercise VII.7.1c) 
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3. a) Any connection on the circle is symmetric and locally flat. 

b) The connection of Exercise VIII.4.4 and Exercise VIII.6.7 is not glob¬ 
ally flat. 

4. a) The cylinder {(x,y, z) 6 R 3 | x 2 + y 2 = 1 }, with the metric induced 

from the standard Riemannian one on R 3 , is globally flat. (Either use 
Definition 1.01 and result about parallel transport, or the results of 
this section.) 

b) Find a pair of geodesics in this manifold with infinitely many inter¬ 
sections and a pair that do not meet. 

c) Show the open cone {(x,y, z) € R 3 | x 2 + y 2 = ax 2 , z < 0 }, is a 
manifold, and flat locally but not globally in the induced metric from 
R 3 . Show that any geodesic not pointing straight at (0,0,0) can be 
infinitely extended, and that whether it has self-intersections depends 
on whether a > or < |. (Hint: consider the cone cut and laid out flat.) 
Can a geodesic have infinitely many self-intersections? Are there any 
closed geodesics? 

5. a) Show by considering parallel transport as rolling without slipping, or 

otherwise, that the torus in R 3 defined by (x 2 + y 2 + z 2 + 3) 2 = 
16(x 2 + y 2 ), with the metric induced from the standard one on R 3 , is 
not flat. 

b) The subset { x E R 4 | (x x ) 2 + (x 2 ) 2 = (x 3 ) 2 +(x 4 ) 2 = 1 }, is a manifold 
diffeomorphic to the torus above. The metric on it induced from the 
standard Riemannian one on R 4 is globally flat. (Label x by the two 
points (x*,x 2 ) and (x 3 ,x 4 ) in the circle, to get coordinates.) 

6 . a) Show that the metric of your answer to Exercise IX.4.6 is not globally 

flat, by applying 1.06. Is it locally flat? 
b) Find an embedding of enough of the plane to contain Fig. IX.4.5 in R 3 
to induce your metric on a vertical section through the camel and the 
palm tree, (or at least find one that reproduces its qualitative features 
as regards geodesics: Fig. 1.3). 

Do you suppose that hot air “bends space” in this way, in some 
Rn or pm un k n own? 
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2. The Curvature Tensor 


We now know the theory of locally flat spaces. But for the purposes of the 
“geometrising” of gravitation we mentioned in 4.03, this is like knowing the 
behaviour of light in a Newtonian vacuum: not helpful when there is matter 
present, and not leading to the theory of mirages and microscopes. If we 
are content to suppose spacetime flat, and write in “forces” that “act” in 
it, we can locally do optics and - very artificially - celestial mechanics as 
though in an affine space (though globally spacetime may be topologically 
more interesting than that). But to subsume gravitational forces, like optical 
effects, in the geometry, we just handle curvature. 

One reason for interest in this approach is cosmological: one may regard 
the transformation of dynamics into geometry as “fiction” if prejudiced, but 
the result carries information not easily accessible in the flat approach. Just 
as topological type is limited by possession of a particular local structure 
(1.04 et seq.), so therefore does admission of it. If the dynamics can be 
described by giving spacetime a particular metric, whether or not it “really” 
has that metric, then spacetime is a manifold such as can have that metric: 
the topological implications follow. 

So the cosmos is rather different from the classical solar system with its 
mathematically equivalent descriptions using either earth or sun as unmoving 
centre, and only a taste for simplicity dictating choice of the sun. In the small, 
“flat” and “unflat” physics may or may not be equivalent (depending on the 
theories) but in the large the unflat has implications that cannot be obtained 
directly from the flat, which is either local or assumes that spacetime is R 4 . 
Let us look, then, at curved spaces. 

2.01. Local Curvature. If “curved” intrinsically in a manifold M is to mean 
“not flat” in the sense of “flat” explored in §1, a measure of curvature must 
be a measure of the breakdown of the defining property of flatness; parallel 
transport from p to q independent of the path between them. 

The first thing to observe is that if we know how far this breaks down for 
small curves, we know how it breaks down for all curves: Fig. 2.1 illustrates 
this. Since parallel transport along a piece of curve, followed by parallel 
transport back along it, is the identity on tangent vectors at the starting 
point, it is easy to see that the difference between parallel transport along 
c and c is in a suitable sense the “sum” of the differences between curves 
like c and c. Curvature is thus a local property of M: if we know about 
the differences between curves up to an arbitrary small size, we know the 
difference between parallel transport along any two curves, between any p and 
q , that can be related by a picture like Fig. 2.1. (If M is simply connected, 
this means any two curves at all from p to q; otherwise, algebraic relations 
like those between local and global flatness are involved.) 
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Fig. 2.1 suggests immediately a natural approach to measurement of 
curvature. First, make all the curves involved lie smoothly together in a 
parametrised surface (IX.3.03) 

5 : [0,1] x [a, 6] —► Af 

with 5(t,a) = p, S(t y b) = q, Vt G [0,1] and 5(0,s) = c(s), 5(1, s) = c($), 
Vs G [a, 6], and define the c and c curves as the composites of 5 with little 
rectangular paths in [0,1] x [a, 6]. Now the lack of route-independence of 
parallel transport, between p and q is obviously equally given by the difference 
of parallel transport along c and back along c, from the identity on T p Af. So 
we are led to say that the total curvature along any piece of surface is the 
difference from the identity of parallel transport once around its edge. 

The difference between transport along c and c, then, is given by the 
total curvature of any surface between them, and this is obtained by adding 
the total curvature of all the small pieces □ of the surface, in the manner of 
Fig. 2.1. Now, in the manner of an electromagnetism course proving Stokes’s 
Theorem, all we need do is substitute “infinitesimal” for “small” and “inte¬ 
grating” for “adding”. Then the “infinitesimal curvature” at a point x G M 
becomes the assignment, to each infinitesimal piece P of surface with a cor¬ 
ner at x, of the “infinitesimal rotation” away from the identity T X M —>T X M 
that results from parallel transport around the edge of P . Integrating this 
over the whole surface gives the total curvature. 

This approach works perfectly. However, it needs to be somewhat more 
precisely expressed. Apart from being careful about limits, and replacing 
infinitesimals by tangent space constructions (or setting up infinitesimals 
rigorously), one must be precise about the “adding” of total curvatures, and 
the integration of infinitesimal rotations, at different places, that we were so 
glib about just now. This can be done, and extremely elegantly: the main 
tool is the “moving frame” and the associated formulation of connections, 
due to Cartan. Unfortunately, this involves differential forms, and some the¬ 
ory of Lie groups (neither of which we have space to treat properly), for 


Oix*. 7^a£/Le#fui£Zciz L PAyAtcJ. 




306 


X. Curvature 


the geometry to be rigorous and visible; without them it means a relapse to 
coordinate manipulations and gropings in the dark. So we shall not be inte¬ 
grating curvature 1 , but we shall think of its meaning as above. This yields 
directly some of the properties it should have and guides our search for a 
rigorous definition in terms of the machinery we already have; once found 
this will let us prove these and other properties, and perform computations. 
We shall continue to carry the above geometry as a way to explain what is 
happening, even without having the formalism to show why it is what is hap¬ 
pening, rigorously. The other choices would be to omit adequate geometrical 
explanation, or to cover so much purely mathematical material as never to 
approach physics: both defeating our objective of making geometry available 
in undergraduate mathematics for physics courses like that from which this 
book evolved. So we shall accompany our less geometrical formal proofs with 
more geometrical handwaving, to be made more precise in a later volume, or 
by the reader’s digging into wholly maths-oriented texts such as [Spivak] or 
[Kobayashi and Nomizu]. 

So what properties should p i-+ (curvature at p) have? 

First, if at all p £ M it vanishes, then M should be locally flat. Round 
any p £ M we can find a simply connected neighbourhood U (say, the unit 
ball in some chart). For any two curves between q,q' £U we can find a surface 
between them, integrate curvature over it, and get the difference between 
parallel transport along them. But this integral of a vanishing quantity should 
be zero, so there is no difference: M is locally flat. (The proof using our 
rigorous definitions is Theorem 2.05, but this is why it is true.) 

Secondly, what is an “infinitesimal rotation”? By the above, it is an 
infinitesimal displacement from the identity on T p M. We have observed, in 
discussing Theorem VIII.4.06, that an infinitesimal displacement S from p 
in the old language means a tangent vector v at p to M in the new. In 
this case v is tangent, at the identity I p :T P M -+T P M, to the vector space 
L(T P M;T P M) and we can free it to be an element V of L(T P M;T P M). What 
kind of an element? 

T p M has metric tensor G p , and since V is compatible with G parallel 
transport along any small finite curve from p to p must be a rotation. An 
“infinitesimal” rotation then will be tangent at I p to 0(T P Af), the collection 
of orthogonal operators on T p M , which sits naturally as a manifold embedded 
in L(T p M;T p M). (As the rotations of the plane, essentially the circle of 
angles to turn through, embed in 4-dimensional £(R 2 ;R 2 ).) Not stopping to 
prove this submanifold property, we look at the properties that V must have. 
Representing v , as in VIII.§1, by a curve c with c(0) = J p , to be sure that v 
is tangent to 0(T p M) we must have c(t) £ 0(T p M)> Vf. (c*(0) is certainly 
tangent to 0(T p M) if cis in 0(T p M).) So for x,y £ T P M , (c(tf)(a;))-(c(tf)(y)) 


1 Strictly one integrates the curvature form. 
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is the same as x • y for all t ; (c( )(*)) • (c( )(t/)) is constant. Thus for given 

y 

» = ^W)W)WKv))(o> 

= (V e .c( )(*)( 0 )) • o( 0 )(») + <* 0 )(*) ■ (T t .c( )(»)( 0 )) , 

with the usual connection on L(T p M\T p M). Hence, 0 = V(x)-(y) + x-V(y) 
(why? think about transport in L(T P M;T P M)) so V(x) • y = -* • V(y) 
always. 

Equivalently (X.2.07), 

V(x) • * = 0 V* G T P M. 

Such a V is called skew-self-adjoint, since V is exactly the negative of its 
adjoint. Any “infinitesimal rotation” at p can thus be given by a skew-self- 
adjoint operator on T p M (and in fact any such operator can be realised this 
way). Analogously, the traceless operators studied in IX .§6 are “infinitesimal 
unimodular operators”. 

The coordinate form is immediate from IV.3.14; with respect to an or¬ 
thonormal basis when G p is an inner product, skew-self-adjointness is equiv¬ 
alent to the condition [V]j = — [V]* and is called skew-symmetry, a usage 
(like “symmetry”) which it is wiser to avoid in the indefinite case. 

The reader should convince himself that in a rotation in ordinary 2 or 3- 
space, every vector’s tip is at any moment moving at right angles to the vector 
(this is the condition V(x) • x = 0), and interpret similarly the equivalent 
condition V(x) • y = —x • V(y). Skew-self-adjointness just extends these 
intuitive facts about vectorial rates of rotation to general dimensions and 
metric tensors. 

What, finally, should an “infinitesimal piece of area” be? 

Before leaving “small” for “infinitesimal” we were talking about the im¬ 
ages of little rectangles. As “small” gets smaller, these images look more and 
more like parallelograms, with straighter and straighter sides. “In the limit” 
then, the sides meeting at p turn into infinitesimal displacements - tangent 
vectors - at p(u,v say), and we have an actual parallelogram P(u, v) defined 
by them in their common plane Q (which is the “linear approximation” at p 
to the surface itself). So curvature at p should give us for each way Q that the 
surface can pass through P, an “infinitesimal rotation per area” as we shall 
be “integrating over an area”. (Clearly it must depend on the attitude of Q : 
if S 1 is the equator in S 2 , and S 2 x R C R 4 has the induced metric, the sur¬ 
face 5 1 x R is just a cylinder, and parallel transport of vectors in T(S X x R) 
is clearly independent of route (prove it!) while S 2 x { 0 } which meets it 
in any p £ S l x {0} carries the “nonflatness” of S 2 x R.) This “attitude- 
dependent, per-area-of-parallelogram” behaviour means we want on each Q 
a skew-symmetric bilinear form (cf. Exercise V.1.11) with values in the space 
of “infinitesimal rotations”. Skew-symmetry is very natural here: switch u 
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and v and we reverse the way we go round the parallelogram P(u, v), which 
should obviously give minus the rotation. 

It turns out that the rotation per area, though it depends on Q , does so 
linearly, so that the curvature is described by a skew-symmetric bilinear map 
assigning to any (ia,v) GT p M xT p M the “infinitesimal rotation” that results 
from parallel transport around the “infinitesimal parallelogram” P(u,v). 

To sum up, we are looking for a skew-symmetric bilinear map from 
T p M x T p M to the space of skew-self-adjoint operators T p M -+T p M. Equiv¬ 
alently, we want a linear map 

R p : T P M ® T P M L(T p M; T P M) 
which is the same by V.1.08 as a map 

T P M <g> TpM (TpM)* ® TpM 
which corresponds to an element of 

(T P M ® TpM)* <g> ((TpM)* <8> T P M) = (T^M) P , 
namely a (^-tensor at p. 

We shall use R to indicate both the collection of maps R p as above, and 
the (g)-tensor field to which it naturally corresponds. When the distinction 
matters the context will make clear which aspect we are using. 

We evidently want R p to depend smoothly on p (we always want things 
to depend smoothly on p), and the skew-self-adjointness and skew-symmetry 
conditions give 

((R p (u,»))(*)) • V = * • ((R p (ii, v))(y)) as real numbers 
R p (ti, v) = — R p (v, u) as operators. 
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These identities are satisfied by the tensor we are about to set up, along with 
others that are hard to motivate geometrically without a deep analysis of the 
“torsion-zero” condition on the connection which yields them. 


2.02. Small Pieces. Let us look for what Rp(tip, v p ) “should” be for given 
tip, 6 T p M y by parallel transport of a typical t p E T P M around a small 
bent parallelogram that in the limit of smallness tends to the unbent one 
in TpM defined by u p and v p . Extending ti p , v p to vector fields ti, v with 
corresponding local flows <£, we transport t p along the solution curve of 
n through p to <j>%{p) and then along a curve of v to Similarly, 

we can transport it via rp s (p) to <j>t{^ s {p)\ If u and v commute we then 
have two vectors tangent to M at the same point = tl> 8 (<t>t(p)) = 

say, and we can subtract one from the other to find how they differ. (If they 
don’t commute we have to transport t across a gap: hence in the limit we 
can expect a correction term involving [ti, v], which is the limit per unit area 
of the gap (cf. VII.7.03). For the moment assume they commute.) Labelling 
the four parallel transport maps involved: 


T^, f (p)M 



T q M 


i 


T P M T^(p)M 

T i 

we get a “difference” vector (T 5 3 r/(t p ) - t ? t ?( t p )) in T q M . 

More conveniently, let us extend t p also to a field t, and look at the 
vector 

(T?T}rt q -( T tfrt q 


which equally measures the curvature of our little piece of area and stays in 
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T P M as we vary s and t . We expect curvature to be “per area”, and the 
closer our bent piece of area approximates a parallelogram as it shrinks the 
more nearly its area goes as ts. This suggests that we look for R p (u,v)t in 
the limit as 



(5,t)—0 ts 


This can be expressed, if it exists, in terms of limits we have taken before. 
First, rewrite it as 

(,5ho h - (^.00 - *p) 

- - T?~t Mp) - - i p ))) . 

Since subtraction is continuous, this gives (using Exercise 1) 



By continuity of and division by non-zero s and f, this is 



which by VIII.4.06 is exactly 

£37 

alias, by VIII.4.06 again, 

Vu,(Vv*)-Vv,(Vut)- 

So we have a formula for R p (ia p , v p )t p (if we can prove it independent of the 
extensions ti, v, t), when u and v commute: R p (ti p , v p ) is exactly the extent 
to which Vt* and fail to do the same. Rather than compute directly 
the correction factor when u and v do not commute (which would involve 
the proof of the assertion left unproved in VII.7.03) we treat the above as 
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motivation for our definition of curvature with a correction factor that is 
clearly reasonable, and that by Exercise 2e is the only choice compatible 
with the desired independence of extensions. 

2.03. Definition. The curvature of a connection V on M is the map 
R : T l M x T l M -+ L(T l M\T l M) 

defined by 


R(ti, v)t = V tt (V v i) - V v (V*i) - V[ U)V ]t . 

It is immediate that R(ia,i>) is indeed linear, and so does lie in the real 
vector space L(T 1 M,T 1 M). furthermore (Exercise 2) the vector (R(ti, v)t) p 
depends only on ti p , v p and t p , so we may write it as R p (u p , v p )t p to indicate 
independence of the extensions. (We cannot expect to define R p without 
taking extensions beyond p and T p Af, as it is not T P M that is curved. It 
is remarkable that though all three terms in the definition depend on the 
choice of extensions, R p does not.) As above we may consider R as a ( 3 )- 
tensor field on Af, the curvature tensor field of V. If V is the Levi-Civita 
connection, R is the Riemann tensor on M (not “pseudo-Riemann”, even if 
M is pseudo-Riemannian). 

(WARNING: some writers call minus our R the Riemann tensor. As 
with the Lorentz metric, is is not common in the journals to say which sign 
you are using.) 

The first property we argued that R should have is that its vanishing 
everywhere should imply flatness. This depends on the following lemma. 

2.04. Lemma. Let S : / x J —► M be a parametrised surface, v a vector field 
along S. Then in the notation of IX.3.03, i/5(t,s) = x, 

— A,v(t,s) - — A t v(t,s ) = R x (S t *(M),SJ(M))i>(t,s) • 

Proof. If at (<,«) 6 / x J either S t * or S* vanishes, both sides are zero. If 
not, D( t>5 )5 is injective and we can use Exercise VII.2.7a to find a chart 
U — ► R n around S(t,s) in which S t *, 5* are the restrictions to S of d 8i 
which commute. Extend any w along S to w on U with ^‘(x 1 ,... ,x n ) = 
^(x^x 2 ), i = l,...,n. The result follows since the definitions then imply 
A t w = (V^tlb) o S, A s w = (Va 3 tb) o 5; applying this with w = v, A t v and 
A s v in turn makes the left hand side equal the definition of the right, applied 
to □ 

2.05. Theorem. A symmetric connection V on a manifold M is locally flat 
if and only if its curvature tensor field R is identically zero. 
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Proof. If M has a chart around p such that parallel transport is given by 
the usual affine one in R n , we have commuting basis vector fields d%. Using 
the expansion in 2.02 of the definition of R p (u, v) when ti, v commute, it is 
immediate that the operators R p (9,-, dj) on T P M are zero. Hence R p (ti, v) = 0 
in general, by linearity. 

Suppose that in the domain of the chart <t> : U —► R n around p the 
curvature tensor vanishes. Then without loss of generality suppose </>(p) = 
(0,...,0) and <j>(U) = {(ar 1 ,... ,ar n ) £ R n | |x*| < 1, Vi}, an open box 
around the origin. 

If v p is any vector at p, define v on U as follows. At points “on” the 
x^axis (via ^), define it by parallel transport along that axis. 

For points in a line 

L = {(*\<,0,... ) 0)||*|<1} , 

define it by parallel transport along L from (x 1 , 0,..., 0) where it was defined 
in the first step. 

Inductively, for points of the form (x 1 ,..., x*, 0,..., 0) define it by par¬ 
allel transport along 

{(xV..,a:‘ -1 ,f,0,...,0) | |<| < 1} 

from (x 1 ,..., x*" 1 ,0,0), until i = n (Fig. 2.4). Evidently v is a smooth vector 
field with value v p at p\ we claim it is parallel. 

Now at points (x 1 ,0,..., 0) we have = 0 for all i, by construction. 
At a point (x 1 , x 2 ,0,..., 0) we have only v to check. Defining S(t , s) = 
(x 1 + 1, x 2 + s, 0,..., 0) we have 

0 = R(5' t *,S , *)v since R = 0 by assumption 

= At(A,v)-A 9 (A t v) by 2.04 

= — A s (Atv) since A s v is 0 by construction of v. 

Thus Atv is parallel along c : s (x 1 , x 2 + s, 0,..., 0); but at c(—x 1 ) = 
(x 1 ,0,..., 0) it is zero by construction, so it is zero also at (x 1 , x 2 ,0,..., 0). 
But A t v(t,s) is exactly where q = 5(t,s), so we have shown that 

Va x v is 0. 

Onward by induction: we parallel transport ..., from 

(x 1 ,...,x*,x ,+1 ,0, ...,0) to (x 1 ,... ,x*,0,... ,0) where the first (i - 1) are 
zero by induction and the i-th by construction. All of Va <+1 v,...,Va n v 
vanish at (x 1 ,..., x* +1 ,0,..., 0) by construction. 

Thus VdiV = 0 everywhere in U for all i, so by linearity at each point 
q G U we have = 0 for all u £ T q M, so v is parallel along all curves in 
U. Thus parallel transport of v p from p to q £ U is independent of the curve 
in U. 
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Fig. 2.4 


But v p was arbitrary, hence U is globally flat. Finally, p was arbitrary 
so M is locally flat. □ 

2.06. Corollary. A pseudo-Riemannian manifold has curvature identically 
zero if and only if around every point we can so choose coordinates that the 
gij become constant functions ±6(j . 

Proof Apply 1.05 and choose orthonormal coordinates on the affine space. 

□ 

We next check the skew symmetry and skew-self-adjointness we obtained 
loosely in 2.01 for R, along with two other properties that come from the 
symmetry of V. 

2.07. Lemma. The Riemann tensor satisfies f for any u , v, w, x on M , 

A) R(u, v) = — R(v,ti) 

B) R(ii,t?)u> • x = — R(u, v)x • w 

C) R(u,v)w + R(v,tu)u + R(u>,ti)v = 0 

D) R(n, v)w • x = R(tn,*)u • v 
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(A holds for the curvature tensor of any connection, C for that of any 
symmetric connection, B for that of any connection compatible with the 
metric. D requires that R be the Riemann tensor.) 

Proof. 

A) is an immediate consequence of the definition, since V w depends linearly 
on w and [ u,v ] = — [v,ti]. 

C) is almost as immediate. It suffices to prove it at any p G M for u p , v pi 
w p , which we extend arbitrarily to commuting vector fields u, v, w in 
a neighbourhood of p . (Alternatively, work with completely arbitrarily 
u , v, w and use the Jacobi identity (Exercise VII.7.6).) Thus with the 
Lie bracket terms vanishing, 

R(ti, v)w + R(v, w)u + R(ti>, u)v 

= (V,*V v w — V v V u w) + (V v V w u - V w V v u) + (VtpVut; — V u V w v) 
= (V u V v w - V v V w u) + (V v V w u - V w V u v) + (V w V u v - V u V v w) 

by VIII.5.02 

= 0 . 

This result is called Bianchi’s first identity. 

B) we deduce as follows. Any linear operator A is skew-self-adjoint if and 
only if Aw • w = 0 for all w, since this implies 

0 = A(x+y)-(x+y) = Ax-x+Ax-y+Ay*x + Ay y = Ax y+Ay x 

and the converse is trivial. We can again suppose u , v commuting, so it 
suffices to prove that 

R(u, v)w • w = 0 

which is equivalent to 

(V u V v tn) • w = (V v V u w) • w 

for commuting ti, v, by applying Definition 2.03. Now by VIII.6.03 
v(w • w) = (V v tn) • w + w • V v w — 2(V v w) • w 

and 

t*(v(t 0 • w)) = u(2(V v w) • w) = 2(V t4 V v ti;) • w + 2V„ti? • V u w , 
so 

(V u V„iij) • w = |ti(v(u; • ti?)) — V v tu • . 

Now 

(V^Vutu) • = \v(u{w • m)) - V u tn • V v w by similar reasoning 
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= ^ v(u(w • tu)) — V v w • V u w (G symmetric) 

= ^ti(v(ti? • w)) — V v w • V u w 

since v(u(f)) = u(v(/)) V/, 
by the assumption that [u,v] = 0 
= (V u V v tu) • w as required. 

(But why it is true is that R(ti, v) is an “infinitesimal rotation”, discussed 
in 2.01.) 

D), a fact as simple as it is surprising (since the need to think about rotations 
with four directions to consider, and zero torsion to bear in mind, makes 
it less than easy to see geometrically why something so neat should hold), 
is an algebraic consequence of A, B and C. It is much more memorable 
than its proof, which is summarised as Exercise 3. □ 

2.08. Components. As usual, just insert the Thus the components of 
R are defined by 

R(«k,ft)(ai) = ^ w ft. 

so that R l j kl is the i-th component of the image of dj under R(9*, 9/). (This 
order for the indices is a little odd in relation to the left-hand side, but is 
a lot older than the point of view which produced the latter and is utterly 
standard.) As important as the R^ kl are the dot products in B and D above: 
in components we get 

* (R (dk,di)dj) • di = ( Rj k idh ) • di = Rjki(dh • $«) = 9hiRjki = Rijki 

in the usual “lowered index” notation for an application of Gj to one part of 
a tensor. 

Though geometrically less primitive than the which are the com¬ 
ponents of what we geometrically decided curvature ought to be, the Rijki 
carry the same information (since Wj kl = g th Rhjki) and are manipulatively 
more convenient. The identities above become 

A) Rijki — ~~Rijlk 

B) Rijki = ~~Rjikl 

C) Rijki + Rikij + Riljk = 0 (Bianchi’s first identity) 

D) Rijki = Rklij (using * for both sides and applying A and B). 

(The Rj kJ express A and C equally well, but we need the <9, orthonormal 
for skew-symmetry to be just R*j kl = and if R ^ 0 we can’t get them 

orthonormal everywhere, by 2.06. The + and - signs involved for general 
skew-self-adjointness, depending on whether pairs of vectors have the same 
type, complicate things further.) 
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Finding the components of R in terms of those of V we proceed as follows. 

= since d k , 0/ commute 

= Va„ (r, h jd h ) - Va» (Itjdk) by definition of the rj k 

= (&(/$&) -rfraM - (&(/$&) - r^a,(d h )) , 

by VIII.3.01.C iv) 

= (&/J - r,$r tA - d,iii + r kj r th)di » 

changing dummy index on the 1 st and 3 rd terms. 


So 

ft jk i = duiij - d,r kj + r^ri h - r!}r kh , 

and 

Rijkl = 9ihRj k l 

= gikihrfj - d,r kj ) + gih (rrjit - r !? r L) 

= 9ih(d k rij - d,r' kj ) + (r%r lmi - r%r kmi ) 

in terms of the components of G and V. Substitution from VIII.6.06 gives 
formulae in terms of the gij , and their first and second partial derivatives, 
alone; but if the reader likes hairy formulae for their own sake that much, by 
now he has forsaken this book for another. 

2.09. Independent Components. The space of (^-tensors on an n-dimen- 
sional vector space has dimension n 4 . Thus in coordinates R will have on 
a surface, 3 -manifold or spacetime respectively, 16, 81 or 256 component 
functions. The relevant number is in fact somewhat smaller. For instance 
since Rijki = —Rjiki we must always have Rum = 0. In fact (Exercise 4) 
the space of (g)-tensors at each p E M with the symmetries required of a 
curvature tensor has dimension ^n 2 (n 2 — 1). There is no very natural choice 
of ^n 2 (n 2 — 1 ) basis vectors in terms of the 9,*, however, for general n. 

2 . 10 . Bianchi’s Second Identity. One of the striking points in 2.01 was 
that the difference between parallel transport along two curves from p to 
q is given by the integral (suitably defined) of R over any surface between 
them, independently of the choice of surface. This fact absolutely requires 
the “curvature form” viewpoint for its proof and clear exposition. But it 
should recall a familiar fact to students of electromagnetism: the integral 
of the “curl” of a given vector field over a surface in R 3 depends only on 
the boundary of the surface, by Stokes’s Theorem. This holds moreover 
locally for the integral over the surface of any field v whose divergence is 
zero, since this implies that v is the curl of some field, at least locally. (And 
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therefore, in topologically simple R 3 , globally - an implication that does not 
hold in general, contrary to too many books.) So we have a condition on the 
derivatives of v, zero divergence, integrating to an “independence of surface” 
result. Something very similar applies here: R is in a very precise sense a 
generalised “curl” of V, and the independence of its integral of the surface 
integrated over, for a fixed boundary, has a local equivalent (analogous to 
“zero divergence”) the condition 

(V,*R)(v,ti>) + (V*R)(t»,u) + (V«R)(ii, v) = 0 

for u, v } w 6 T l M. (V tt R etc. are of course (^-tensor fields like R, and like 
R can be treated as maps T l M x T X M —♦ L(T X M ;T 1 M).) In components, 
this becomes evidently 

Rijk j + Rikij + Riij-,k - 0 

which is the classical form of Bianchi’s second identity (sometimes known 
simply as BianchVs identity). The proof is straightforward, and left to the 
reader (Exercise 5). This equation is not obviously related to integrating 
curvature over surfaces, but in the “curvature form” context appropriate to 
such integration it does become so. 

The reader is probably familiar with a number of conservation laws, 
and should note that they all have their differential and integral forms. For 
example if v is a field of force it is locally equivalent to say “curl v = 0” or 
that “work done in going along a path depends only on its boundary” (that is, 
on its end points), and if it is a static magnetic field the fact that div v = 0 is 
equivalent to the fact that the magnetic flux through a surface depends only 
on the boundary (“between two surfaces with the same boundary, no lines of 
force get lost”). The similarity of all such laws, both in their differential and 
integral forms, cannot however be displayed without a coherent language for 
integration; it is part of the more perfect approach we mentioned in 2.01. 

2.11. Definition. If the curvature tensor on M is constant in the sense of 
VIII.7.10, then we say M has constant curvature (Exercise 6). 


Exercises X.2 

1. Suppose that /($,$) is defined for (s,t) 6 R 2 , s>t > 0, and there exists 
x = lim(, >t )«,(o ) o) /(M)> in the sense of Exercise VII.1.1a (here R 2 has 
its usual topology, and / takes values in any Hausdorff space X ). 
Show that both of 
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must exist and be equal to x if the limits 

Jim/(«,*) , Jjm/(*,<) 

exist whenever we fix s,t respectively near enough to 0. 

2. a) Show from the definition that 

R(u, u)(t + $') = R(ii, v)t + R(u, v)t f 
R(n,v)(/t) = /R(u,v)t 

for any vector fields ti, t>, i, i' on M, / : M —► R. Deduce by expressing 
t as t'di , where t* : M —► R, that (R(u,v)t) depends for fixed ti, v 
only on i p , not i. Define (R(ti,v))^, and show it to be linear. 

b) Show that ( u,v ) ( R(u,v)) p is bilinear, and depends only on u p 

and v p . (If ti', v' have u' p = ti p , v' = v p , (ti - u') p = 0 = (t; - v') p . 
Consider R(tn,a;) where tn p = 0 = x p , and use bilinearity.) 

c) Deduce that (R(/ti,(/v)ht) p = f(p)g(p)h(p)(R(u,v)) p t p . Define R p . 

d) Show that any map 

R : T X M x T l M -> L^Af^M) , 
which preserves addition and satisfies 
* R(fu, gv)ht = fghR(u , v)t 

for any /,</,h : M —► R, can be specified by a (JJ-tensor field. 

e) Show that if R as in (d) satisfies 


R(u y v)t = V u V v t — V v V w t 
when ti and v commute, it satisfies 


R(u , v)t = V u V v t - V„V u t - 


in general. (Express ti, t? as sums of functions multiplying the 5,*, 
which commute, and use *.) 

f) Discuss the relationship between checks like (d) and VIII.5.04, that 
something defined using values around p actually only uses values at p 
(and so defines a tensor), with the checks common in physics texts that 
a set of functions defined in terms of a coordinate system transforms 
correctly (and so defines a tensor). 

3. Deduce from the first Bianchi identity that for any ti,v,tu, x E T X M 
El R(ti, v)w • x + R(t>, tu)ti • x + R (id, u)v • x = 0 
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Write down the corresponding E2, E3, E4 with the first terms 
R(*,ti)v • tt>, R(tn,aj)ti • v, R(v,u>)* • u. Subtract (E3+E4) from 
(E1+E2) and use 2.07 A,B to deduce that 

R(x,ti)v • w = R(u,ttj)x • ti for any ti, v , w , x 

which is just 2.07 D rewritten. 

4. Express 2.07A,B,C as the condition that R p be in the kernel of a 
suitable linear map 5 : (T$M) P —► R m , where m = -j*-(lln 2 + 1). 
(Hint: one component of S(R p ) might be R 2l2 + i?22i)* Show that 5 
is surjective, and deduce from 1.2.10 that the space of tensors satisfying 
2.07A,B,C,D has dimension ^j(n 2 — 1). 

5. a) If A £ and ti, v \,..., v n E T l M , show that 

k 

VuA(»i, = V u (A(t>i,.. .,»„)) - A(wi, ■ •., V„t H, 

1 — 1 

from VIII.7.03. 

b) Use (a) and 2.07 to prove the second Bianchi identity in the form 
without coordinates, or use 2.08 and IX.7.08 to prove it in components. 
Show that the two forms are equivalent. 

6. If X has constant curvature R ^ 0, can some chart make its compo¬ 
nents R*j kf constant? 

3. Curved Surfaces 

3.01. Gaussian Curvature. When n = 2 the formula of 2.09 for the dimen¬ 
sion of the space of possible curvature tensors reduced to (^)4(4— 1) = 1, so 
on a surface R is essentially just a number proportional to area at each point. 
This is in keeping with our discussion in 2.01; an “infinitesimal rotation” of 
a tangent plane is exactly a scalar “rate of turn” since rotations of a plane 
are so much simpler than those of higher spaces. This “rate of turn per area” 
or “density of rate of turn” is correspondingly much simpler than curvature 
in higher dimensions and its study is older, as witness its name of Gaussian 
curvature. (Riemann laid the foundations of tensor geometry a generation 
after Gauss’s work on surfaces, and some of the ideas we have presented date 
from as recently as the 1950’s.) Integrating this quantity is likewise much 
simpler. Even here we shall not cover the technicalities 2 in this volume but 
it is worth mentioning Gauss’s result on such integration. 

2 Not hard to find, since so many books start with the geometry of surfaces, usually 
embedded in R 3 . The best modem one is [do Carmo]. But in our view it is easier 
to motivate R directly, and see why it reduces to a scalar on surfaces, than motivate 
Gaussian curvature separately by geometric ideas special to it and then let it explode 
into a fourth-degree tensor when n goes to 3. Moreover, many results such as Schur’s 
Theorem (§5) are true only when n > 2, so surfaces do not illustrate them. 
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Fig. 3,2 


The effect of parallel transport around a closed curve is a rotation of the 
tangent space through an angle equal to the integral of the curvature over 
any surface whose boundary it is, allowing for orientation. (It may be that 
there is no such surface, as for the curve c of Fig. 3.1. The study of when 
this happens is the beginning of homology theory, part of algebraic topology.) 
In particular, if the curve is a “geodesic triangle” (defined as “three points 
joined by geodesics” which reduces to meaning an ordinary triangle if the 
space is affine), we know how a tangent vector to a side is transported along 
the side and hence how any vector is, thanks to n = 2. With its three jumps 
at corners (treatment of “piecewise smooth” curves is one of the technicalities 
we are skipping) it is thus clear that the tangent vector to the boundary turns 
through (Fig. 3.2) an angle of 

^(tt — A) + (ir — B) + (7r — C) + integral of curvature over inside of triangle^ 

when we go round once, ending where it began. Since a turn of 2 tt is no turn, 
this gives Gauss’s result 

A + B + C = 7r + integral 

which reduces to the Euclidean result (and is equivalent to the parallel pos¬ 
tulate) for the plane with its usual, curvature-zero, metric, and generalises it 
to curved surfaces. 

Notice that on a saddle or spindle shape the angle sum is less than 7r, 
corresponding to negative Gaussian curvature, and on a convex one such as a 
sphere it is greater. (Thus, on the Earth there is a very right-angled triangle 
with corners at the N. Pole, 0° N, 0° W and 0° N, 90° W, whose angle sum is 
|tt; for negative curvature, test with string some geodesics on the middle of 
an adult human female chest.) 
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3.02. Deflection. Similarly, curvature can allow a “geodesic diangle” - two 
points joined by two distinct geodesics as in Exercise IX.4.6, Exercise 1.6 - 
and the integral of the curvature over the surface between them gives the 
amount one is “deflected” relative to the other. For example, suppose on R 2 
we have a Riemannian metric with non-zero curvature (positive) only near the 
origin; such a metric is induced, for instance, by the embedding in Euclidean 
3-space shown in Fig. 3.3. Then the two geodesics shown meet twice, despite 
each lying entirely in a flat part of the space, because of the curvature between 
them. Thus curvature in a small region can affect the global geometry of the 
whole space. It is the four-dimensional, pseudo-Riemannian analogue of this 
effect that is being analysed in the extraordinaly careful measurements of 
stellar positions made at every solar eclipse, since general relativity links the 
presence of matter to curvature (Chap. XII). Since the amount of curvature 
imposed by even a solar mass is not great, two geodesics that meet twice 
are not much “deflected”, hence are never very far apart, hence for there to 
be much curvature “between them” they must pass near the central region, 
which is why light following them is lost in sunlight except at eclipse. Indeed, 
since Earth is too close to the sun for null geodesics from the same star passing 
on either side to meet here, at least one must be blocked. So the deflection 
cannot be measured directly as a difference, but only as a change of apparent 
position; a very delicate job for such a small change. 

This foretaste of general relativistic effects via analogy with ordinary 
surfaces is offered in hopes that the reader may find it helpful, but the reader 
should treat it with caution. Firstly, because of the differences between Rie¬ 
mannian and pseudo-Riemannian geometry (cf. the examples in IX.§5, §6), 
though with due care the above discussion of triangles can be made precise 
almost as easily in the pseudo-Riemannian case. There one defines the “angle 
turned through” with the aid of cosh rather than cos (cf. IX.§6). Secondly, 



Fig. 3.3 
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and more importantly, consider the differences between surfaces and higher 
manifolds. We cannot for instance have a small difference between the di¬ 
rections of two null geodesics through a point in two dimensions as we can 
in three. Moreover Fig. 3.3 shows an effect simpler than gravitation can be, 
since in general the presence of specified curvature within a bounded region 
is incompatible with having zero curvature outsider it. This is exactly be¬ 
cause of the independence of spanning surface we remarked on in 2.10. In 
two dimensions this independence is fairly trivial. (Even in a - physically un¬ 
interesting - compact 2-manifold we can find at most two spanning surfaces, 
and in a non-compact case at most one.) But in R 3 , for instance, if non-zero 
curvature is confined to the unit ball B = { (ar, y, z) | x 2 + y 2 + z 1 < 1} any 
two curves from p to q that lie entirely outside B have a spanning surface 
also entirely outside B . On this the curvature to be integrated is identically 
zero (hence parallel transport along the two curves gives the same results.) 
Thus the integral must vanish over any surface between them, even one that 
does go through 5, which severely limits the curvature we can have on J3. 
A straightforward 3-dimensional analogue of Fig. 3.3 is thus impossible. We 
could allow curvature in a region C = { (x, y, z) \x 2 + y 2 < 1} with a metric 
on each z = constant plane like that induced on one plane in Fig. 3.3 (though 
to realise it by an embedding we would need four flat dimensions) and get 
“deflected” geodesics as in Fig. 3.4 (Exercise 2). The independence of surface 
of the integral of curvature between / and g then says that whether we go 
through C high or low we get the same answer. This is reasonable, smack¬ 
ing of a conservation law for whatever is causing the curvature in C, if we 
think of z as time. But in a four-dimensional spacetime, any analogue that 
confines the curvature to regions that are non-compact only in time keeps it 
dodgeable by spanning surfaces. 

We cannot then expect to describe gravity simply by making curvature 
a function of matter, vanishing in empty space. In the dimensions we live in, 
this would mean that geodesics outside a ball of matter would meet twice, 



Fig. 3.4 
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or not, independently of whether the matter was there. Hardly a model for 
gravitation, which can make satellites going East meet satellites going West 
in a regular fashion which is clearly related to the presence of the Earth. 


Exercises X.3 

1. Use Gauss’s theorem about geodesic triangles and prove or assume 
that any Riemannian surface can be broken up into such triangles 
(Fig. 3.5) to prove the Gauss-Bonnet Theorem : the integral of curva¬ 
ture over a compact surface S is equal to 2x times the Euler character¬ 
istic (v — e + /) of 5, where t; is the number of vertices, e the number 
of edges, and / the number of triangular faces. (Euler first showed 
this number to be 2 for any such subdivision of S 2 : it is independent 
of the triangulation for any surface.) 

This incidentally gives another proof of 1.04: since the curvature 
must integrate to 4 tt it cannot be everywhere zero, so S 2 cannot be 
locally flat. Of all compact surfaces, in fact, only the torus and Klein 
bottle have Euler characteristic zero, so only these are possibilities for 
locally flat metrics; both in fact can possess them. 

The Gauss-Bonnet Theorem is the earliest of all results relating 
curvature to an algebraic-topological quantity. 

2. a) Let (r, 0, z) be cylindral coordinates on R 3 . Show that the Euclidean 

metric on R 3 is given by the line element 

(ds) 2 = (dz) 2 + r 2 (d0) 2 + (dr) 2 
where r / 0 (cf. Exercise IX.5.1). 

b) Let / : R —► R be smooth and satisfy /(0) = 1, f(x) = .99 if x 2 > 1. 
(Hint: try f(x) = ^(.99 + e ( ^ r * ) ), for x 2 < 1.) 

Define a metric G on R 3 by the line element formula 



Fig. 3.5 
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(i ds ) 2 = ( dz ) 2 - r 2 f(r)(d0) 2 - (dr) 2 for r ^ 0. 

Show that it extends smoothly to points where r = 0 (though this par¬ 
ticular coordinate representation does not). Give its form in (x,y, z) 
coordinates. 

c) Compute the Levi-Civita rj k for G, and the components of the Rie- 
mann tensor, and give the geodesic equation in terms of r, 0 and z 
only, at points where r > 1. 

d) Give explicitly a typical geodesic c : R —► R 3 with respect to G that 
does not pass through c = { (r,0,z) | r < 1}. 

(Hint: reduce the problem to that of geodesics in the plane z = 0, using 
Exercise IX.1.4. Prove that the geometry of this plane is that of the 
surface of Fig. 3.3 in Euclidean 3-space, up to the sign of the metric, 
where the conical part is generated by lines at cos'"(.99) to the z-axis. 
Find a map from U C R 2 to this cone, (r,0) (r,^p#,?) £ R 3 , by 

which straight lines correspond to geodesics.) 

e) There are two distinct timelike geodesics with this metric, from 
(50,0,0) to (60, tt, 200), that do not pass through C . 

f) There are no geodesics in this manifold with domain R and image 
contained in { (r, 0, z) | 1 < r < k }, for any it E R. 

g) Show that while passing geodesics are “deflected by C”, a curve of the 
form t »-► (r, 0 , t) for r, 0 fixed is a geodesic. (This geometry does not 
make things “fall”.) 

3. Gauss was so pleased to discover that his curvature depends only on 
a surface’s metric tensor - not on its embedding in R 3 - he called the 
result his remarkable theorem , or Theorema Egregium. The name has 
stuck. Why is VIII.6.07 a generalisation of the Theorema Egregium? 


4. Geodesic Deviation 

Suppose in a spacetime we have F, a parametrised surface (Definition IX.3.03) 
with each s-constant curve F s a timelike geodesic parametrised by arc length, 
and each F t a null geodesic. Then suppose an observer Q following F S) of 
Fig. 4.1 is watching a particle P following F s . He watches via light rays. 
That is, information about P at point Aq in spacetime reaches Q at A \, by 
parallel transport along F t . Information about P at Aq comes similarly to 
Q at A[ along F{. Suppose Q ’s interest is in P’s velocity F* (That is, P’s 
spacetime velocity, which can be turned into a “space per time” velocity by 
a choice of chart - alias frame of reference, sometimes.) This by definition of 
“geodesic” is parallel transported along F s . Then we see that the change he 
records is exactly the difference between parallel transport F s (t) to A[ via A\ 
or j4q, since Q “remembers” the previous value by parallel transport along 
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his own world line So the total acceleration Q perceives in P between 
Ao and A* 0 is given by the “total curvature”, in the sense of 2.01, of the piece 
of the surface bounded by the four bits of geodesic. 

Going to the limit, then, the relative acceleration of two “infinitely close” 
geodesics is given directly by the curvature tensor. (In practice this means 
“sufficiently close”; consider the definition of limits, and the bars to absolute 
accuracy of measurement.) In particular R tells us whether geodesics are 
locally inclined to approach or separate; it also describes how they move 
around each other. 

With a number of “sufficiently close” geodesics passing near a point, 
conversely, R itself can be determined to any given level of accuracy. 

Physically this means that for a bunch of freely-falling particles with 
neglible gravitational effect on each other, the relative motions (produced in 
Newtonian terms by “tidal forces”) which spread out, flatten, rotate, etc. 
the bunch are in this description determined by the local value of R, and 
themselves suffice - given enough particles - to determine R in their vicinity. 
The curvature tensor is thus a physical “quantity” which can be directly 
determined from measurements in the neighbourhood of a point, not just 
a construct from the metric tensor G. (It would be hard to measure G 
accurately enough locally to get useful experimental values for the second 
derivatives of it involved in R.) 

Computing the “relative motion” of two actual geodesics, for which we 
could take the limit as they approach, involves integral techniques. We there¬ 
fore defer the precise treatment of geodesic deviation. 
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5. Sectional Curvature 

Computation with the whole curvature tensor is somewhat unwieldy in com¬ 
ponents, particularly as 2.09 cannot in general be utilised in any convenient 
way to reduce the number of component functions to yj(n 2 — 1)* That the 
2 n 2 (n — 1) function Rum and Rijkk are necessarily zero allows us to restrict 
attention to the other n 2 (n — l) 2 of the Rijkh but on a 4-D spacetime this 
still leaves 144 functions. For the rest of this chapter, then, we consider ways 
of “cutting down” the Riemann tensor to get various kinds of significant 
information. 

5.01. Definition. One of these ways is, since curvature is so much simpler for 
surfaces, to consider the surfaces 5 in M passing through the point p € M of 
interest: what curvature does that of M impose on them? To avoid curvature 
that is not imposed, we want a curve in S through p to be as straight as M 
permits - a geodesic. Thus we consider the images under the exponential map 
exp a of non-degenerate planes (2-dimensional vector subspaces) of T P M , and 
examine their curvature with the induced metric. This turns out to be, at p, 
a restriction of the curvature tensor on Af. More precisely, u py v p , w py x p 
vectors in such a plane P C T P M\ Rm is the Riemann tensor of the metric 
G on M , and R p is the Riemann tensor on P of the metric (non-degenerate 
in a neighbourhood of 0 G P) given by 

* f • s = G(D X exp p (t) y D x exp p (s)) for x G P, i, s E T X P. 

Then we have 

** (Rp(0)(« p ,t>p))w>p • x p = (Rm( p)(tip,®p))t» p • x p 

considering the vectors involved as on the left tangent vectors to P at 0, on 
the right to M at p, in the natural way (Exercise 1). 

Now if Up, v p are not linearly independent, both sides vanish, by skew- 
symmetry. If they are, we know all such R p (u py v p )w p • x p if we know 
Rp(u p , v p )u p - Vp, by skew-self-adjointness, since u p , v p form a basis for ToP. 
(Just write w p = ti; 1 Up + w 2 v py x p = x l u p + x 2 v p and expand.) So only this 
matters for the curvature of exp p (P). 

In 2.01 we used “proportional to area of a parallelogram” as a way of see¬ 
ing “bilinear and skew-symmetric in the vectors defining the parallelogram” 
geometrically. This is legitimate even without a specific measure of area in 
mind, since we can only change our measure by scalar multiplication, which 
leaves such proportionality intact. But in a plane P with a metric tensor we 
do have a natural measure of area. Namely, choose any orthonormal basis 
6i, &2 for P and set the area of the parallelogram defined by u, v equal 
to the determinant of the map P —► P defined by a6i + /?f >2 au + /3v. 
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(cf. 1.3.05 et seq. By Exercise IV.2.2b it is immediate that this does not 
depend on the choice of orthonormal &i, 62 except up to sign, which will 
disappear in the squaring that follows.) Call this area ||ti, v||. It then follows 
from skew-symmetry and skew-self-adjointness that the number 

R(ti, t?)ti • v 

ii«,*ii 2 

depends only on the plane P defined by u and v in T p Af, if they are linearly 
independent (Exercise 2). (If u — Av, neither P nor this number is defined). 
We call Jfc(P) the sectional curvature of M at p for the section P C T p M. 

Now suppose we know k(P) for any P. This determines R(u, v)u • v for 
any pair u, v by 

R(w,t;)ii • v = ||ti, v\\ 2 k(P(u, v)) 

where P(ti,v) is the plane of u and v. Then this determines R(u, v)w • x 
for any tu,® £ T P M (not just in P(ti, v) as above), as a consequence of the 
symmetry 2.07D of R(ti, v)w • x in the pairs (ti, v) and (w y x). This fact is 
very similar to the polarization identity (Exercise IV.1.7d: if a bilinear form 
is symmetric we can express its value on any pair (®, y) in terms of its values 
on pairs of the form (z, z)) but it is somewhat more awkward to construct a 
formula as we are dealing with “bivectors” rather than vectors. Thus the fact 
that the R(u,v)ti • v determine the R(u y v)w • x (and hence the function k 
on the space of planes determines R) is most quickly established less directly 
(Exercise 3). 

(The analogy with the polarization identity helps to avoid a curiously 
common error. A symmetric bilinear form A on A has its values A(®, y) fixed 
if we know all A(z, z), z E X, A somewhat smaller set of z will suffice, but it 
is not enough to know only A( 6 i, 6 ,), i = 1,... ,n, for a given basis 61 ,... , 6 n 
(Exercise 4). In exactly the same way, knowing all the R(u,v)u • v suffices 
to know R completely, but knowing all the R (di,dj)di • dj = —Rijij does not 
suffice to determine Rijki in general. This should be clear dimensionally. The 
^(n - 1 ) numbers Rijij = Rjiji can hardly suffice if n > 2 to fix a point in 
the ^(n 2 — l)-dimensional space of possible curvature tensors. But it is nice 
to have a simpler analogue of the way they fail.) 

Thus the (^-tensor field R on M can be expressed in terms of a (§)- 
tensor field (real-valued function) on the ^ (n + l)-dimensional manifold of 
planes in the tangent spaces of M. It is often easier to deal with functions 
than with tensors, and there is a surprising result that concerns k directly: 

5.02. Schur’s Theorem. If dim M > 3 and we have a function k : M —► R 
such that for any non-degenerate plane P C T p M we have k(P) = ±«(p) 
according as G induces a definite or indefinite metric on P, then k is con¬ 
stant. (Thus f if sectional curvature is at every point independent of all but 
the signature of the section it is independent also of the point.) 
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Proof. Exercise 5. □ 

5.03. Corollary. M satisfying the above conditions is of constant curvature 
(Definition 2.11). □ 

It is a little surprising that 5.02 holds even for M pseudo-Riemannian. 
The definition of k(P) required P non-degenerate (so that we could find 
for P an orthonormal basis), and in a pseudo-Riemannian manifold of 
dimension > 2 every tangent space contains degenerate planes. On these 
we have no more a natural notion of “area” than we have of “length” on 
a null line, so that while we can still reduce R to its values of the form 
R(u, v)u • v we cannot always further reduce these to a sectional curvature. 
(Note that P(ti,v) may be degenerate independently of whether either or 
both of u, v are null.) But it turns out that there are not, in a topological 
sense, “enough” planes where k is undefined, to block the theorem. 


Exercises X.5 

1. Prove the equation ** of 5.01. (You need to relate the connection on 
M for G to that on P for the metric given by *. Notice that in general, 
M-parallel transport of tangent vectors to a surface 5 in M need not 
agree with 5-parallel transport, along a curve in 5 - think of M = R 3 , 
5 = 5 2 - thus there is something to prove. This is one place where a 
component argument, with the right chart, has its advantages.) 

2. If w = w x u + w 2 v and x = x l u + x 2 v are linearly independent, then 

R(w,x)w - x _ R(ii,i>)u • v 

||w,*H 2 “ ||U,*||3 

3. a) Any (JJ-tensor S on a metric vector space X which satisfies A, B, C 

(and hence also D) of 2.07, together with 

S(w, v)u • v = 0 , Vu, v G X, 

is identically zero. 

b) Deduce that if two (^-tensors Q, R on X satisfying A, B, C of 2.07 
have 

Q(n, v)u • v = i2(u, v)u • v Vn, v £ X 

then Q = R. 

c) Deduce that Q = R even if "iu^v € X ” is replaced by “Vti,v € X 
linearly independent and such that the plane containing ti, v is non¬ 
degenerate”. (Hint: take a sequence of non-degenerate planes tending 
to a typical degenerate.) 
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4. Show that on R 2 with the usual basis ei, e 2 , if we define 

G((*v),(#v))=*v+«y, 

S((*\* 2 ).(#‘.!' ! )) =* I (# 1 + !»’) + + V 7 ) . 

then both G and H are symmetric bilinear forms (in fact inner prod- 
ucts) with llejllo = lle^ltf = \\e 2 \a = INI h = 1. but G f H. 
Draw the sets { v | ||u||o = 1 }, { v || v||h = 1}. 

5. a) Prove that ||u,t>|| 2 = (u-ii)(»-t>)-(tt"») 2 = (g a c 9 bd- 9 ad 9 ic)u a v i u c v d , 

if G restricts to a definite metric on the plane of u and v, minus this 
if indefinite. 

b) Use Exercise 3c to show that if for all non-degenerate planes P C T P M, 
k(P) = n(p) for P entirely spacelike, and k(P) = — k(p) otherwise, 
then 

R(u, v)w • * = k(p)[(u • w)(v • *) - (u • x)(v • tn)] , 
or in coordinates 

Rijk! = K(p)(gu9jk - 9ik9ji) , Rjki = K(pWi9jk - 6k9ji ) • 

c) Use (b), Ricci’s Lemma (VIII.7.06), and the second Bianchi identity to 
show that if dim M > 3, 9^/c = 0 for all h. Deduce that k is constant. 

d) Why is the asumption of the theorem trivially true for a surface? Give 
a counter-example in this dimension. 

6. Show that the converse of 5.03 is false by proving that S 2 x R with 
its usual Riemannian metric has constant curvature, but k(P) is not 
independent of P C T P M. 


6. Ricci and Einstein Tensors 

In this section we “cut down” the Riemann tensor to the part of the curvature 
that we shall associate with the presence of matter at a point, in Chapter XII. 

6.01. Definition. The Ricci transformation at p with respect to a tangent 
vector v £T p M is the map 

R v : T P M —► T P M : u »-► R(ii, v)v . 

Evidently this is linear for each v, and if we know R for all v we know 
in particular 


-R u (v) • v = R(u, v)u • v = -R(ti, v)v • u = -R v (n) • u 
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Fig. 6.1 


for all u y Vy hence by Exercise 5.3 the Ricci transformations suffice to deter¬ 
mine R. 

Geometrically, “R v takes each u to the difference produced in v by 
parallel transport around the infinitesimal parallelogram fixed by u and v 
(Fig. 6.1)” - a statement the reader is left to make precise by an appropriate 
reexpression in terms of limits. 

Thus R v gives us a measure of how curved M is in each of the planes 
containing v, and hence it is a kind of “curvature along v”. But it is still a bit 
elaborate for many purposes, so we sacrifice some information by reducing it 
to 


6 . 02 . Definition. The Ricci curvature Ry of M along a tangent vector v is 
the trace tr R v of the corresponding Ricci transformation. 

Now, the trace of a linear operator is the sum of the diagonal elements 
in its matrix, with respect to any basis whatever (cf. 1.3.14). So let us choose 
a basis to make the geometrical interpretation of this example as simple as 
possible. 

If v is non-null, we have v = Ain for some unit vector w and R v = A 2 R,*. 
Extend w to an orthonormal basis in = 61,62 • • • , 6 n- In the notation of §5, 
we then have ||6j,in|| 2 = 1, for all i ^ 1. Now in an orthonormal basis the 
i-th component x % of any vector x is ±x • 6 ,*. (± depending on whether 6 ,- is 
timelike or spacelike. If 6 ,- • &,• = — 1, that is minus the i-th component of 6 ,-; 
other vectors go likewise.) 

Suppose v is timelike and G has signature 2 — dim M (the simplest and 
most interesting non-Riemannian case). In particular the i-th diagonal entry, 
i ^ 1 , in the matrix of R v for this basis is the i-th component of R v (bt)> which 
is thus 


nr* .a., h 0 ■*(*,»)«•* 

_ a2 R(6,,ip) 6. to 


Ilfc.HI 2 


since || 6 i , t »|| 2 = 1 
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That is A 2 times the sectional curvature for a section in the plane of 6 ,* and 
v . The first diagonal entry is 

R( 6 i, r)v • 61 = R(w • v)v • w = AR(v • v)v • w = 0 • w = 0 . 

Thus the trace R v is A 2 times the sum of the sectional curvatures of M in 
any orthogonal set of (n - 1 ) planes containing v (the A 2 just allowing for 
the size of t>). A similar discussion shows that if M is Riemannian we get 
—A 2 times the sum of the curvatures. 

The Ricci curvature R v thus gives in either case (dividing appropriately 
by ±A 2 (n — 1 ) a kind of “arithmetic mean curvedness” of M in the direction 
v at p. If G induces an indefinite metric on the mean is oddly weighted 
by + and — signs, and if v is null we have no “normalising factor” A 2 , but 
Rv is still the same thing: a convenient scalar measure of curvedness along v. 
(If we had defined sectional curvatures using (u • u)(v • v) — (u • v) 2 instead of 
||ii, v || 2 - cf. Exercise 5.5a - we would get —A x (their sum) for all non-null v.) 

In a similar way we can further “average” the curvature at p by adding 
the R Vi for any choice of orthonormal basis t>i,..., v n for T p Af, but to see why 
the result does not depend on the choice we backtrack slightly, generalising 
the definitions above for algebraic convenience. 

6.03. Definition. The Ricci transformation at p with respect to a pair of 
vectors u f v £ T P M is the linear map 


R U)V : T P M —> T P M : w i—► R(tn, u)v . 

(Thus the R v of 6.01 becomes short for R V)V .) 

The Ricci tensor of M at p is the bilinear map 


R p : T P M x T P M —► R : (ti, v) tr R tt|V . 

so that the Ricci curvature Rv is exactly R p (v,v). Clearly R is a (^-tensor, 
and we have 

6.04. Lemma. The Ricci tensor is symmetric; R(t4,v) = R(v, u). 

Proof. Choose an orthonormal basis &i,..., 6 n for T P M. Then the i-th diag¬ 
onal entry in the matrix R^ ,, becomes, using <r = &,* • 6 t * as a sign factor, 


Rtijvj _ — ^Ru,v(b*) * —- (tR(6j , u jv * 6,' 

= <rR(v,6,)6 t • u 


= <rR(6j, v)u • 6, 

= 0-R v ,u(M • h . 


by definition 

by 2.07 D 
by 2.07 A,B 
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Thus the individual diagonal entries in the matrix are symmetric func¬ 
tions of u and v, hence so is their sum, R. □ 

6.05. Corollary. The Ricci tensor is determined by the Ricci curvatures, 
and vice versa. 

Proof. By the polarisation identity, since R is symmetric, 

R(tl,v) = j^Ru+v “ \Ru—v • 

The converse is trivial. □ 

Algebraically and manipulatively the Ricci tensor is more convenient 
than the Ricci curvature, though geometrically the latter is a simpler idea. 
( This is ju st like the relation between an inner product G and the length 
y/G(v,v) that it defines for single vectors, except that there is no need to 
look at the square root of R(v, v).) 

6.06. Definition. The scalar curvature R(p) of M at p is the contraction 
twice over of the (^-tensor G* ® R (cf. IV.1.12). (Since both G* and R 
are symmetric, the choice of which covariant factors to contract with which 
contravariant ones makes no difference.) 

6.07. Components. It is immediate that if u = u'di, v = v l d{ ) w = w*di, 

Ru,v( w ) = R(w,ii)v = R(w k dk,u l di)(v*dj) = u l v>w k Rj kl di 

(cf. 2.08), so the components of the Ricci transformation R U)V are u l v*Rtj kl , 
those of R v are v l v* Rj kJ . It follows that 

R(u,t>) = tr[uV #},.,] = u'viffji, 

and the components of the Ricci tensor are the sums R^ it (= R , by sym¬ 
metry). These are normally denoted by Rji (or R{j, R a p etc.) the fact of 
having only two indices sufficing to distinguish them from the components of 
the Riemann tensor. 

Another useful expression of them is 

Rii = R? hj = S' h J$j = f'guR&j = 9 kl Rki,j . 

6.08. Warning. Some authors define Rij = R^- Since R^ is skew- 
symmetric in j and /», this gives minus the Ricci tensor we have defined. 
If spacetime is given a metric of signature +2 - spacelike vectors having v • v 
positive - then the same R results (VIII.6.08). The sign of R depends on 
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which contraction is used, and that of G* (g> R and hence of its contraction 
R depends on both these choices. If both choices are opposite to ours, the 
result for R is the same; if only one, the result is minus our scalar curvature. 

6.09. Many Contractions. The trace function is equivalent to contraction 
(V.1.12) so both R and R are contractions of R. It follows easily from the 
symmetries of R that any other contraction is zero or expressible in terms 
of the Ricci tensor, so no other contraction can catch any of the information 
lost in taking this one. Clearly in general information is lost (we consider 
what in §7) as we are going from the %(n 2 — l)-dimensional space of possible 
curvature tensors at a point to the ^(n + l)-dimensional space of symmetric 
bilinear forms. 

The scalar curvature at p is a sum of n 2 terms, g l * Rij. If we choose an 
orthonormal basis for T p M (or make d\(p ) } ..., d n (p) one by taking normal 
coordinates around p) the g % > becomes at p (only!) and R(p) = Ya -i #*t- 
So R(p) is the sum of Ricci curvatures with respect to n orthonormal vec¬ 
tors, referred to at the end of 6.02; independence of choice follows, since 
Definition 6.06 makes no use of bases to define R(p). 

6.10. Ricci Directions. If M is Riemannian, by IV.4.09 we can always find 
an orthonormal basis for T P M (and hence normal coordinates by IV.2.06, 
around p) making Rij(p) = 0 for i ^ j. If M is pseudo-Riemannian we may 
be able to do this, but not necessarily (IV.4.11). If we can, and the principal 
direction of R p are uniquely defined, they are called the Ricci directions of 
M at p and the corresponding Ra are the principal curvatures. 

If G has signature 2 — dimAf, then v 1 for a timelike vector v (the set 
of “entirely spacelike” or “infinite velocity” vectors, according to an observer 
who defines “zero velocity” by v, cf. XI.§2) inherits a negative definite metric. 
Since R restricts to a symmetric form on v 1 we can apply IV.4.09 to get 
always a set of (n — 1) spacelike Ricci directions relative to v. 

If R p is isotropic (IV.4.10) at all p E M, M is called an Einstein manifold 
(we explain why in XII.2.02). Just as “independence of plane” for sectional 
curvature implies independence of position also (Schur’s Theorem), isotropy 
of Rp, which makes Ricci curvature independent of direction, implies that it 
is similarly independent of position. Before we prove this (6.14) we need to 
discuss an important consequence of the second Bianchi identity. 

The tensor involved is R “with one index raised”, which we shall denote 
by R. (R(p) is essentially the map T p M —► T p M : r ^ G|(y k R(ac,y)).) 
If x points in one of the Ricci directions, its image is just x times the cor¬ 
responding principal curvature: consider the proof of IV.4.09.) We shall call 
R the bivariant Ricci tensor when we want a special name, and denote its 
components in the usual way by 72} = g tk Rkj• Notice that its sign involves 
that of G (cf. 6.08), that R(p) is just tr(R(p)), and that R is self-adjoint 
since R is symmetric. 


7*1Oix*. 7^ai4e##ia£liia 



334 


X. Curvature 


that 


Now the Bianchi identity implies the following, considering the (})- 
tensor V„R as a map T p M —* T P M and the (^-tensor VR as the map 
T P M -> L(T P M-T P M ) : w .-+ (« ^ (V«R)») (cf. VIII.7.02, 7.05): 

6.11. Lemma. For any v € T P M, v(R) = 2tr((VR)v). 

In coordinates, since v(R) = dR(v) = v*djR, this means (VIII.7.08) 

djR = rRfj., . 

Proof. Summing n 2nd Bianchi identities, 

Rihj t l + Rijl;h + ^ilh;j = 0 • 

By skew-self-adjointness and V’s commuting with contractions, this gives 

R%j\i + Rii\h = 0 > 

and since V commutes with raising and lowering indices, 

* (9 ik Rijh + (g ik R^,),h - (g ik Ru),j = 0 

So summing over k and /, 

R l j ., + (g u R} jl ),H-(R), j =0. 

But for any function / we have (f)-j = fj = djf , by VIII.7.08, and 

g^R-j, = g a g kh Rkiji = g kh {g il Rikn) 

= g kh Rkj hy the last identity of 6.07 
= R$. 

So changing one dummy index h to /, 

djR = 2 R]., 

as required. □ 

This result can equally be formulated as 


(R!j-^R).,= 0 

applying VIII.7.09, which leads us to make 

6.12. Definition. For any (* )-tensor field J, the (°)-tensor field 

(»i. • • •, v„) tr [« h* (V„J)(n 
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with components J\ x is called its divergence (we see why in IX.3.07) and 
denoted by div J. (In the case of J a contravariant vector field on R 3 with a 
constant metric this reduces to the familiar “V • J”). 

6.11 thus deduces from the Bianchi identity, which as we mentioned 
resembles a conservation law (2.10), that the divergence of the Ricci tensor is 
half the gradient of the scalar curvature, and equivalently that the divergence 
of the Einstein (})-tensor field 

E=R = \RI (Ej = Rj — \R&) in coordinates) 

is identically zero. This result is the conservation equation of general relativ¬ 
ity. Geometrically it is a necessary condition for R to be the Riemann tensor 
of some metric (as dim v = 0 is necessary for a vector field v to be a “curl” 
in R 3 ); physical meaning it will acquire (XI.3.07) when we reach the physical 
theory that uses it. 

6.13. Lemma. If dim M = n>2, R = J3 — j~j(tr E)I. (And hence, 
E = R — ^(trR)J, so that R and E determine each other and thus carry 
the same information.) In particular ifn = 4, as it will often in the two 
remaining chapters, R = E — ^(tr E)I. 

Proof, tr I = n, as it is the sum of the n l’s on the diagonal of the identity 
n x n matrix. 

S ° 

trf? = tr(R - \RI) = R-\RtxI = -R . 

m 

Hence 

E = R + “5(tr£)/, 

n — l 

and therefore 

R = E--±-(tTE)I . 
n — l 

□ 

6.14. Lemma. An Einstein n-manifold, for n > 2, has constant scalar cur¬ 
vature. 

Proof. We have R = A (p)G p Vp € M, for some A : M —► R. Hence 

G* ® R = AG* ® G . 


Contracting 

R = nA . 

Hence 

R = -RG , 
n 
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R = -RI , 
n 

“raising one dummy index”. Therefore 

dR = 2 div R = — div(fiJ) . 
n 


But 


hence 


Thus 


div(/U) = (RS'^'idx 3 in coordinates 

= R^dx* by VIII.7.09 

= (diR)dx l 
= dR 

dR = -dR . 
n 

dR = 0 


if n ^ 2, hence R is constant. □ 

Note the similarity to Schur’s Theorem, not only in the result but in the 
use made of the Bianchi identity. 

6.15. Corollary. An Einstein n-manifold, for n > 2, has constant Ricci 
curvature . 

Proof R = „RG is a constant multiple of G : apply Definition VIII.7.10. □ 


Exercises X.6 


2 . 


Show that the coordinate expression for the divergence of a (* )-tensor 
*ii .jV us ^ n 8 VIII.7.08, is 


i k -i k 4 .4* r* . . r;. 

t=i 

In a particular, for a ({) and (J) field respectively, 

+ tiCSj , <?* = $ + . 

Show that on a 2-manifold the Einstein tensor vanishes identically. 
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7. The Weyl Tensor 


If the dimension n of M is 1, the curvature tensor necessarily vanishes, since 
R(ti,v) is skew-symmetric in u and v. If n = 2, ^(n 1 — 2) = 1; since 
contraction down to R is thus a linear, non-zero map between 1-dimensional 
spaces it is an isomorphism, and R reduces to a scalar as in §3. In three 
dimensions, 

£(»’-!) = »-!(»+ D 

so that R and R live in spaces of the same dimension. It is not hard to 
show directly that contraction from R to R is surjective, hence for n = 3 an 
isomorphism. Thus on 3-manifolds the Riemann tensor is determined by the 
Ricci tensor (cf. Exercise 3). 

On a 4-manifold, however, 

^(n 2 -l) = 20, |(n+l) = 10 

so the contraction has a non-zero kernel (and so on for n > 4, since n 4 
increases much faster than n 2 ). What is lost? 

Consider what information is lost in general by contraction, or equivar 
lently by taking a trace. Any linear operator on an n-dimensional space X 
can be expressed by 

A = S + T, where S = A- -(tr A)I , T = -(tiA)I. 

n n 


Then 

tr T — —(tr A) tr / = —(tr A)n = tr A , 
n n 

tr 5 = 0 


so that we have expressed A, in a natural way as a sum of “traceless” 
and “traceable” parts: essentially decomposing L(X;X) as the direct sum 
(ker(tr)) ® { xl | x £ R }, (cf. Exercise VII.3.1). 

We can decompose the Ricci transformation in this way, to get R u>v = 
S U}V + T UiV and have the trace R(u,v) carried by T UtV while S UjV rep¬ 
resents the information lost by taking the trace since trS U)V = 0. This 
does not quite give a satisfactory decomposition of Jl, since the two re¬ 
constructed 4-tensors S and T do not have the symmetries 2.07. How¬ 
ever, by imposing the symmetries on them (in the way that for a bilinear 
A : X x X —► R, for instance, B(u,v) = A(u,v) + A(v,ia) is its sym¬ 
metrised and F(ti, v) = A(u,u) — A(v,ti) its skew-symmetrised form), and 
seeking a similarly symmetrised term to allow for the remaining contraction 
down to scalar curvature, we are led to 
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7.01. Definition. The Weyl tensor on an n-manifold (n > 2) with metric 
tensor field G is the (^-tensor field defined by 

C(u,v)w = R(u,v)w 

— y ^R(v, w )u — R(ti, w)v — (u • w)r(v) + (v • 

+ (.- IX— 2) ((»■«’)»-(«•■■»)») 

where r(v) = Gj(* R(u,z)) } or equivalently by 

C(u,v)w • x 
= R(ti, v)w • x 

“ n _ 2 ((u*)R(v, w) - (v aj)R(n, w) - (ti-in)R(u, x) + (u-tn)R(ti, a;)) 

+ (7- 1K« - 2) ( (tt ■ • w ) - ( v ■ *)(“ • w )) • 

In coordinates, then 

Qjki = Rijki — - — 2 (dikRji — guRjk — gjkRii + gjiRikj 

+ (n —1X- —2)(»“*'-»«<*) • 

C is the analogue of 5 in the simpler example above, since Cj ik = 0 
(Exercise lc); it is the “traceless” or “contractionless” part of R. R is deter¬ 
mined by C and its “traceable part” R, using R = tr R and any of the above 
three equations, thus C contains exactly the information lost in contracting 
R to R. If the Ricci tensor (or equivalently by 6.13 the Einstein tensor) van¬ 
ishes, it is immediate that R = C. Physically, this permits the Weyl tensor 
to emerge in Chapter XII as the “vacuum curvature”. 


Exercises X.7 

1. a) Show that the three equations given above to define the Weyl tensor 

are equivalent, and that the map R i-* C is linear. 

b) Show that C has the symmetries 2.07. 

c) Show that Cj ki = 0, and deduce that all contractions of C vanish. 

2. a) The space L(X; X) = X*®X inherits the metric G*®G from a metric 

G on X (cf. IV.1.12, V.1.08); is A k (A — £(tr A)/) orthogonal 
projection onto ker(tr), with respect to this metric? 
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b) Is R p i—► C p orthogonal projection onto ker(R R), with respect to 
the analogous metric? 

3. a) Show that on a 3-manifold the Weyl tensor vanishes identically, 
b) Deduce an expression for the Riemann tensor in terms of the Ricci 
tensor, on a 3-manifold. 
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“Regard motion as though it were stationary, and what becomes of motion? 
Treat the stationary as though it moved, and that disposes of the stationary. 
Both these having been disposed of, what becomes of the One?” 

Seng-ts’fin 

In this chapter and the next we examine the specific models of physical 
phenomena that grew from the considerations discussed in Chap. 0.§3. 


1. Orienting Spacetimes 

We start by introducing some ideas for general spacetimes, needed in this 
chapter and the next. Basically, our models for spacetime are Lorentz 4- 
manifolds (cf. VII.3.04). Now we add some definitions that will allow formal 
models for the motion of physical “particles”. 

1.01. Definition. Let Af be a connected Lorentz manifold. 

Choose one timelike vector v (VII.3.04) at some p G Af as forward in 
time. Then a timelike or non-zero null vector w at q G Af is also forward 
if there is a continuous curve c : [a, b] TM from v to w such that for no 
s G [a, b] is c(s) a spacelike or zero vector. (Notice that c is not a curve in 
Af. Its projection II o c by the bundle map (VII.3.03), which is in Af, is not 
required to be a like curve.) 

For any non-spacelike w ^ 0, such a curve will exist from v to either w 
or — w (Exercise la). If for no w does both happen, Af is time-orientable. 
The choice of a v is then a time-orientation of Af, and Af with such a choice 
made is time-oriented . In a time oriented manifold, if a non-spacelike w ^ 0 
is not forward it is backward. 

A curve or path c in a time-oriented manifold Af is forward (respectively, 
backward ) if its tangent vector c*(t) (VII.5.02) is forward (respectively, back¬ 
ward) for all t. We shall usually parametrise timelike forward curves by 
proper time (IX.4.05) denoted by <r. 

Af is causal if there is no forward curve c : [a, 6] —► Af with c(a ) = c(b). 
(Exercise lc,d,e; cf. also IX.6.07.) Physically, if Af is not causal you can go 
forward in time to meet yourself starting out (or something can). Normal 
ideas of causality break down and physics becomes very complicated. For 
example, in an initial-value problem the data cannot be given arbitrarily, 
since they must form part of their own solutions for time ahead and past. A 
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compact Lorentz manifold cannot be causal (why not?) so spacetime must be 
noncompact, be noncausal or somewhere break down in its Lorentz structure. 

Minkowski space is time-orientable and causal (Exercise 2), hence so 
is the 4-dimensional affine space X (without intrinsic coordinates) with a 
constant Lorentz metric, which special relativity takes as a model for physical 
spacetime. (X will have this standard meaning throughout the chapter.) 

1 . 02 . Definition. A spacelike section of a time-oriented spacetime M is a 
smoothly embedded 3-manifold X C M such that 

(i) the induced metric on S is everywhere negative definite 

(ii) for every p £ M there is a timelike curve through p that meets 5, 
and either all such curves are forward from x to 5, all are backward, 
or p £ S. 

A forward curve c : ]a, b[ —+ M is called a history. A history c with some 
c(t) £ S is at rest relative to 5, or at srest , if c*(t) • v = 0 for all v tangent 
to S. 

The term “history” is suggested by the idea of c as a potential “trajectory 
through spacetime of a particle (or physicist)”. Some books implicitly use 
the term “world line” for the same concept (cf. IX. 1.02). In fact (XII.§4) the 
notion of “particle” is problematical in classical relativistic physics, which 
strictly work only with fields. The approximate concept of a “particle” as 
an entity at a point simply provides useful motivation in some discussions. 
The “forward curve” in contrast is a mathematically precise concept that 
we can discuss rigorously in what follows, even if it lacks a strictly precise 
physical interpretation. (As indeed does even the definition of “derivative”, 
for analogous reasons.) 


Exercises XI. 1 

1. a) In a general connected Lorentz manifold M with timelike v £ T P M, 
and timelike or null w £ T q M , choose a path a : [a, b] —► M from p to 
q and show by working in successive charts along its image that there 
is a path a in TM such that a(t) is always timelike and at a(f). Then 
show that there is such an affine path 7 in T q M from a(b) either to w 
or to —w , and combine a and 7 to get c as in Definition 1.01. (Hint 
for finding 7 : show that there is such a 7 from a(b) to w if and only 
if a(b) • w is positive.) 

b) Show that M is time-orientable if and only if there is no curve c in TM 
from some arbitrarily chosen non-spacelike v to — v with c(t) never 
spacelike or zero. 

c) Show that S 1 x R 3 (where S l is the circle), with the metric given by 
(ds) 2 = ( dO ) 2 — ( dx ) 2 — (dy) 2 — ( dz ) 2 in the obvious coordinates, is 
time-orientable but not causal. Is it flat? 
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d) Show that whether a time-orientable manifold is causal does not de¬ 
pend on the orientation. 

e) Find a locally flat spacetime which is not time-orientable (think of 
the Mobius strip or the Klein bottle x R 2 ). Can a non-time-orientable 
spacetime be “causal”, in the sense of having no timelike closed curves? 

2. Prove that Minkowski space M 4 is time-orientable and causal. (Di¬ 
vide the non-spacelike vectors in its vector space L 4 into forward and 
backward, and carry the distinction to each T x M 4 by d^~.) 


2. Motion in Flat Spacetime 

2.01. Definition. An inertial frame F for the affine space X , with constant 
Lorentz metric, that we consider throughout this chapter is a choice F eo of 
a unit (hence non-null) forward vector in the vector space T of X. We shall 
also denote by F eo the vector d£~( F e o) G T x X , for any x G X. (The reader 
may add precision if he wishes by writing F e 0x, F eo y etc., but this multiplies 
suffixes beyond comfort - particularly when x is replaced by c(t).) A particle 
whose motion is described by a forward curve, or more precisely a history c 
in X is at rest relative to F , or at F rest , at c(t) if c*(t) = A( F eo) G T C ^X 
for some A G R. An affine history c at F rest for some (and hence all) c(t ) 
will also be called an inertial observer , to whom the frame F is appropriate. 

This amounts to the choice of a “rest velocity”, relative to which others 
are to be measured. To measure them, we define the time-component t F (w) 
relative to F of a vector w in T or any T x X to be w • F eo G R, the space - 
component to be &f( w ) = w — t F (w) F eo G (^eo) x in T or T x X . We 
call vectors w with tp(w) = 0 entirely spacelike relative to F , or entirely 
F spacelike. 

The time difference relative to F between x,y G X is (repeating the 
label t F for a function of the two arguments x and y) t F (x,y) = fjp(d(x,y)), 
the time-component of their vector separation in X. If tp(x,y) = 0, x and 
y are F simultaneous. Their space-separation relative to F is dp{x ) y) = 
8p(d{x,y) G T, and their F distance is dp(x,y) = ||djp(^, 2 /)||* (The quan¬ 
tity t F (x,y) is of course only a matter of direct physical measurement when 
d F [*,v) = 0 , since only then can there be an observer at rest relative to F 
(that is a physicist whose motion is approximated by a history at F rest) who 
measures the time between x and y along his world line. For events off his 
world line he has to infer a time label by allowing for the time he takes to 
learn of them. However, tF(x,y) does give exactly the difference in the time 
labels he uses, and d x (x,y) the separation of his space labels.) 

The velocity relative to F or F velocity of a forward curve c : [a, 6 ] —► X 
at c(t) is the vector, entirely spacelike (relative to F), 
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4(0 = 


*( 40 ) 

t F {c*(t)) 


€ T c (t)X . 


Evidently this is equal to 

d c(t) ^ (40, c( )) (t)) ( d F (c(a), c( )) * (t) j . 

Here the scalar differential, equal to allows for variety in parametri- 

sation (c may not be going ahead in time at unit speed according to F) and 
the vector (by abuse of language considered free in T rather than bound at 
df'(c(a),c(t)) to simplify the expression) is the derivative of “spatial posi¬ 
tion” according to F. 

The F speed of c at c(t) is the real number 


Vp(t) — \J\c* F (t) • Cp(t )I , 


(| | needed since c F (t) spacelike). Evidently vp(t) = 0 if and only if c is at 
F iest at c(t). 

2.02. Time Dilation. By IX.4.02, if d F (c(a) y c(b)) = 0 but c is not always 
at F rest, then time measured along c is less than f/^(c(a),c(6)). This is 
physically interpreted as the (experimental verified) statement that “time 
passes more slowly” or is “dilated” for a clock (atomic, or heartbeat, or ...) 
whose motion is described within experimental error by the history c. We now 
have the notation to derive some classical formulae for the results obtained 
geometrically in IX.4. Assume that c*(t) never vanishes and reparametrise 
as follows: 

Define / : [a, 6] —► [0,^F(c(a), c(b))] : t »-► t^(c(a), c(t)). Then 


4f 

ds 


t F (c*) 


which never vanishes, since F eo is orthogonal only to spacelike vectors. Thus 
/ has a smooth inverse g and we can define 6 = tf(c(a),c(6)) and 


c = c o g : [0, b] —► X 

has c(t) as “that position in A, on the curve c, which according to an observer 
at F rest is t later than c(a)”. Then if (cf. VII.5.02) 


<r:[0,6]->R 

is arc length (proper time) along c we have (cf. IX.4.05) 
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“j“(0 = Vc*(t)-c*(t) 

= \J (<f(c*(<))) 2 + («f(c*(<))) • (^(c*(0)) (why?) 

= \/ M **(<))) 2 + Of ( c *(0)^ f (0) • Of ( c *(0)^(0) 

by definition of cj. (2.01) 

= ^/l - V F 

since ^(c*(<)) = 1 by construction, and — v F = c]p(f) * ?p(t)- 

This is the classical formula for “relativistic time dilation”. The formula 
applies even when c does not “ F return” by having d/r(c(a),c(6)) = 0 and in 
particular, when c is at rest in some other inertial Frame F '. But if wp‘ is the 
speed relative to F f of a history h at F rest, the same formula applies; relative 
to P', the time for h is dilated. This symmetry between two inertial frames, 
and consequently between two affine histories (two inertial observers), helps 
the feeling of “paradox” that persists in some quarters. It was tempting to 
transfer the symmetry to the case of one history not affine. But the formula 
applies only when F is constant. If interpreted as the frame used by an 
observer P following c, who sets F eo = c *, this requires that d c (f)C*(t) be 
constant. Otherwise F eo changes, and the affine subspace 5* of points in a 
purely spacelike relation to c(t) (“ F simultaneous with c(f)”) turns, so that 
P’s time labels become more complicated. A sharp acceleration turns St 
rapidly. In Fig. 2.1, L is the path of an inertial observer Q between the same 
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points in X. While P is turning, his labelling system considers proper time 
along L to change faster than P’s own time labels for the points. 

Computing a correct formula, that would reduce to y/l — v F for P in¬ 
ertial, requires bringing in the rate of change of c*. The result is messy, and 
physically meaningless anyway: would mean “how fast his proper time 

a*, is changing relative to mine <r, right now ”, namely at L fl S a using the 
frame of reference F eo = c*. But as we can’t properly compare watches till 
we reach the same point, and can’t detect what is happening “right now” 
except where we are, this is essentially a fiction. Only the integrals of the 
proper times signify for comparison. (Note how P’s labelling system breaks 
down at points like N .) 

It is thus safer to treat co mparisons of proper time along curves in the 
manner of IX.4: use >/l — only with a fixed inertial frame F - and then 
cautiously. 

We come now to an even earlier formula, predating Einstein: 

2.03. Lemma (Lorentz-Fitzgerald contraction). Let x,y € X have spacelike 
separation d(x,y) € T, with F distance l where F is a frame of reference 
in which d(x,y) is entirely spacelike (x, y are F simultaneous). Then in the 
frame appropriate to an inertial observer Q whose F speed is vp, their distance 
is 



In general, if Q 9 s F velocity vp is linearly independent ofd(x,y), 



where wp = |vjr • e\, e being the unit vector in the d(x,y) direction. 

Proof. Exercise 1. □ 

(2.03 is a simpler statement than we could make about measurements of 
the length of a rod - the usual description - because that would require going 
into what point in the history of one end we compare with a given point for 
the other to get “length”. Not even the ends of a stick can be simultaneous 
absolutely.) 

The time and space measurement alterations of 2.02 and 2.03 were at 
the core of the original formulation of special relativity, from the postulate 
that no experiment whatever can prove one observer “at rest” rather than 
another, in particular that all observers must obtain the same value for the 
velocity of light. Minkowski was the first to replace time and space with 
these “corrections” by a single flat geometric spacetime , now called Minkowski 
space, which different observers resolve differently into separate “space” and 
“time”. 
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2.04. Four-velocity. We shall interpret the mathematical models of this 
chapter and the next in the usual physical way. We suppose that a physicist 
whose motion we describe by a history c perceives a motion into the future 
at unit speed (one second per second), using at each point c(t) the frame ap¬ 
propriate to the affine history tangent there to c. So the perceived “velocity” 
through spacetime is the unit tangent vector in the direction of c*(t). If c is 
parametrised by proper time <r, (arc length along timelike curves, IX.4.05), 
the perceived vector at c(<r) is thus exactly c*(cr). For the conveniences this 
offers, we shall always assume parametrisation by arc length (rather than, for 
instance, the F-dependent parametrisation c given in 2.02, “parametrisation 
by F-time”) unless otherwise stated. 

The vector c*(< r) is often called the 4-velocity of c at c(<r). For it was 
discovered as a set of four components (Exercise 2), 

(dx° dx l dx 2 dx 3> \ ^ 1 v 1 v 2 v s 

\d<T ’ d<r ' dcr ’ d<T ) y/l — v 2 p * yf\ - V 2 f ' y/l - vj, 

(where v 1 , v 2 , v 3 are the components of the space velocity cj^), which “trans¬ 
form as a vector”. That is, the rule on the right produces four functions 
for each choice of affine chart X —► R 4 , with the results for different bases 
appropriately related (cf. VII.4.04). (When one considers how many rules 
producing four functions for each chart do not have this property, it always 
seems rather wonderful when it appears. Unless one defines the vector first, 
then derives the components, so that it is true automatically.) 

2.05. Momentum. Notation: we shall henceforth avoid the use of p to de¬ 
note a point. 

One usually first meets momentum as the contravariant vector p = mv, 
“mass times velocity”. A little surprisingly, it is fundamentally a covariant 
vector as it arises physically. 

There are a number of reasons for this. For example, in Newtonian 
mechanics a typical force acting on a particle is the gradiant of a potential, 
$ say. But d$ is a one-form. So to have 

“rate of change of momentum = force” 
one must either have p covariant or d$ contravariant - 

either “^( mv ) = or M ^(^l( mv )) = • 

The second approach has several advantages. For instance, a simpler 
right hand side to integrate when we want the total change in the quantity 
differentiated on the left (whatever the exact meanings of the differentiation 
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and integration). Moreover, it turns out that a Hamiltonian is best defined 
as a function H : T*M —► R, where M is the space of possible configurations 
for some system. Then / £T*M rather than Gj(/) is used as a description 
of the momentum and H suffices to define a flow <j> on T*M entirely without 
further reference to G. (G is of course usually involved in defining H.) If 
the system has configuration q E M and momentum p £T*M at time 0, it 
will have momentum <£(p,f) E T*M at time t and position II* (<^(p,t)) in 
M (cf. VII.3.03). Thus <f> fully describes the motions of the system. If we 
worked in TM , G would be spuriously present in our computations, obscuring 
the essential geometry of what is happening. (The geometric treatment of 
classical mechanics needs differential forms, which we have had to defer to 
a later volume. The reader is referred to [Abraham and Marsden], [Souriau] 
or - most easily read - [Maclane (1)] for good geometric accounts.) 

This irrelevance of G once H is defined is not hard to prove in classical 
mechanics, but it seems as strange there as does the fact that variational 
principles work (cf. IX.§4, initial remarks). The explanation is again that 
classical theory is a limiting case of that more deeply comprehensive system, 
quantum mechanics. 

Let us therefore give a sketchy account of quantum momentum, hoping 
to make the reader more receptive to covariant momentum vectors in what 
follows. The quantum description of something’s motion is a wave. A com¬ 
plex wave function $ determines the probability of finding the something 
near a given point. The simplest non-trivial solution of all to the simplest 
wave equation is a scalar plane wave filling flat space, 

i>(x y t) = cos (f(x) — ut) + isin(/(x) — ut) 

in “space S and time T separate” language, where u E R and / : S —► 
R is affine. (In “spacetime X” language it is even simpler, just j>(x) = 
cos(</(x)) + isin(<7(z)) where g : X R is affine. But we shall stay non- 
relativistic for the present.) 

Now, this describes the wave, and hence the motion. But unless we know 
the particular metric, we do not know what to mean by the direction of the 
motion described. We know how the phase planes “f(x) — t = constant” 
change with t. But we can say how they move only if we assume that they 
do not “slip sideways” (Fig. 2.2): that they move in a direction orthogonal 
to themselves. Thus if we define / on free vectors as the linear part of /, 
the velocity is v = Gf (/) for G our choice of metric. But, up to a constant 
m say, the covariant momentum Gj(mv) = mf is already contained in the 
geometry, independently of G. 

It is appropriate in the plane wave solution to have v a free vector, since 
this solution corresponds to knowing momentum exactly and correspondingly 
(satisfying the uncertainty principle) position not at all. A more complicated, 
but more useful solution can govern the motion by a “wave packet” localised 
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at some time in a region U of 5, like a short train of waves going down 
a canal. This can still be approximated by a plane wave at each point, if 
the differential calculus is to be trusted, but the results at different places 
will be different (think of ripples on a pond). So our momentum f becomes 
a distinct linear functional for each x E S, hence a cotangent vector field 
(zero outside U ). What we measure as momentum will be f x at the point 
x where we happen to find the particle (roughly speaking) and this becomes 
less accurately predictable as position becomes more so, as the smaller a U 
is allowed for / to be non-zero the more it must vary. 

The quantum mechanical Hamiltonian governs the evolution of ^ by the 
Schrodinger equation. Thereby it controls the changes in the region U C S', 
in which the particle is localised, and the covariant vector field /, with f\s\u 
zero. In the classical limit (as Planck’s constant is taken to 0) U shrinks 
to a point position q E S and / yields a classical momentum p E T*S. 
Correspondingly the classical Hamiltonian controls directly the evolution of 
q and p, not q and v or mv. 

Non-relativistic momenta, then, are covariant vectors. Seeking to make 
the closest possible analogy: 

2.06. Definition (temporary). The J^-morntnium at c{<r) of a history c pa¬ 
rametrised by proper time <r is the covariant vector p(q) = mG|(c*(<r)) E 
T*(o)X- Here m is a positive real number associated at c{cr) with the history, 
called its mass or rest mass. (The physical definition of mass for a parti¬ 
cle P depends on the interactions of P with other things. We will redefine 
it mathematically in 2.09, for a closer relation with the way it is used in 
physics.) 

In coordinates, p has components mgijjjfc, where c(<r) = (c°(^r), c 1 (<r), 
c 2 (cr), c 3 (<r)) by the normal formula for (IV.3.02). Choose an inertial 
frame F, an orthonormal basis F e 0 , e ly e 2 , e 3 for T, and a chart ^ : X -h► R 4 
giving d x (d 0 ) = e*, for all x E X. This means that p is represented by 

m — mv 1 — mv 2 —mv 3 \ 

v i ~ c f’ a / 1 - ’ \/ i - c fJ 


(P0,Pl,P2,P3) 
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in the notation of 2.01, 2.04, since goo = 1, gn = gn = 033 = —1. Now 
Po = ro(l - 4)^ = m(l + \v 2 f + §4 + A4 + ^4 + •••). 

If v F is small compared to the speed 1 ascribed in any frame to a null vector, 
this is well approximated by 


po = m + \mv 2 F . 

The second term is just the classical one for kinetic energy of a particle. 
The higher order terms only become significant at “relati vistic sp eeds” as 
measured by F (those for which the “correction factor” y/l — v F becomes 
important) so they may be thought of as a correction for higher speeds to 
the kinetic energy. 

Thus we may call (p Q — m) the F relativistic kinetic energy of the history 
at c(<r). This indeed agrees with the energy needed to accelerate a particle 
described by c from F rest to c*^), as measured in F. But energy is not really 
a relativistic notion, depending as it does on the frame. (Even in Newtonian 
mechanics the energy |mt; 2 is not absolutely defined, as there has been no 
meaning attached to “absolute zero velocity” since physics abandoned the 
viewpoint of Aristotle.) 

The restriction of p : T c ( ff )X — ► R to entirely F spacelike vectors is 

m * , x 

L .~"T u ' c f(<t) , 

or in components 

u l d{ »-► (summing over i — 1,2,3). 

V 1 " V F 

We call po = . . the F relativistic mass m(v F ). Using it the map is 

u m(v F )u • c F (<r) or u'di i-+ m(t; J p)M , Pt 

which is just the classical, space and time separate, momentum mG|(t>) 
except for the “corrected” mass (and our choice of sign for the metric). Notice 
that this device translates the “kinetic energy” part of the po into “mass”, 
not just the “rest mass” part: po can be regarded as entirely “mass”, and 
justifies this in terms of F by giving the “resistance to acceleration”. In 2.07 
we translate the “rest mass” part into “energy”. 

Classical “momentum, like classical energy, depends on a choice of rest 
velocity and so is not absolutely defined. Special relativity fuses these two 
observer-dependent quantities into the geometrical defined 4-momentum, 
which requires no arbitrary choices for its definition. In this way the theory 
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is much more “absolute” than Newtonian mechanics, most of whose propo¬ 
sitions are relative to a choice of inertial frame. 


2.07. Collisions, Mergers, Splits. We have not yet modelled any forces as 
influences on our histories. Until we do so, we shall assume that the histories 
are the geodesics of flat spacetime - straight lines, affinely parametrised by 
arc length - in an obvious analogy to Newton’s First Law. What happens 
when two or more collide? 

Without considering the short range forces involved, it is natural to 
require the sum of the 4-momenta afterwards to equal the sum of the 4- 
momenta before. This corresponds to the Newtonian conservations of energy 
and momentum separately, and implies them as low speed approximations in 
a particular inertial frame. But it does not reduce to them, even in the low 
speed approximation - it is more general, and simpler. 

Newtonian mechanics requires momentum to be conserved in all colli¬ 
sions. But for conservation of energy it requires either that the collision be 
“perfectly elastic” or that the description of the particles include details of the 
ways their internal structure can absorb the energy that does not reappear as 
gross movement. These ways always involve heat, and hence raise questions 
of statistical mechanics and thermodynamics. Thus Newtonian conservation 
of energy is simple enough for school texts when describing idealised billiard 
balls, but it becomes very complicated to say anything interesting about the 
collision of two balls of wet putty. 

Relativistic mechanics, by contrast, requires conservation of the whole 
4-momentum for all collisions - whether the histories bounce “elastically” 
(Exercise 3) or soggily, or stick together, the sum of the 4-momenta must 
remain unaltered. 


Of course, if a vector is unaltered its individual components in any 
given basis cannot change either. Thus for a given frame F the sum of 
the p 1 = ■ i for the various histories must be conserved, and simi- 

larly for the p 2 and p 3 ; for v F small this approximates the conservation 
of Newtonian momentum. We may therefore call the purely spacelike vector 
Pf = pidx 1 +p 2 dx 2 +p^dx 3 the F momentum. But conservation of po implies 
something new. Consider for instance appropriate histories for two blobs of 
putty which travel towards each other with great speed, meet and merge into 
one blob at F rest. Let them have ^velocities t>i, v 2 > /^relativistic kinetic 
energies E 2, and rest masses mi, m2 before the collision. Then 


Total po before merger = (mi + E\) + (m2 + £2) • 

Total po after merger = m , the rest mass of the new blob, 
m = (mi + m 2 ) + (£1 + E 2 ) 
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so the total rest mass present has increased by E\ + i?2> “energy has become 
mass”. Rest mass is not conserved, though p and po are. 

Conversely, run the same collision backward: we end up with less rest 
mass, more kinetic energy than we started with. Making a structure of static 
matter above a city fly apart into two (or may more) pieces at high speed 
turns a few micrograms of its mass into a disastrous quantity of energy. 
(Many of the “pieces”, in practice, may have zero rest mass: cf. 2.10). 

We shall therefore refer to m equally as rest mass or as rest energy and 
po as F energy; mass and energy, as concepts, merge. Notice that both are 
anyway relative to a choice of inertial frame - changing frame changes them 
and their sum, along with momentum. Mass-energy and momentum are not 
equivalent in the arithmetic way that mass and energy are, however, but 
related in the same more subtle way as time and distance measurements are 
to each other. 

2.08. Unnatural Units. In this subsection (but nowhere else in Chapters I- 
XII), c refers to the number “speed of light”. Numerically it is close to 
3 x 10 8 metres per second. 

The discussion in Chap. 0.§3, without the special choice of units, would 
lead to a Lorentz metric given by 

G'(x,y) = c 2 x°y° — x l y l — x 2 y 2 — x 3 y 3 

and to equivalent but more complicated formulae. The “one second per sec¬ 
ond in time, no change in space” perception of own movement by a physicist, 
whose motion we describe by an affine history, gives a 4-velocity (1,0,0,0) 
in an “appropriate frame” (cf. 2.01). The metric G' assigns this a length 
c, not 1, so proper time a differs by the factor c from “arc length”, as pa¬ 
rameters for curves. The time-dilation formula (2.02) becomes in these new 
units: 



The Lorentz-Fitzgerald contraction (2.03) becomes 



The components of 4-velocity (2.04) become 

/ i 


and correspondingly for a 4-momentum m(Gj(v)) 
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me* 


—mv 


1 


- mv* 


■mr 




by IV.3.02, since </oo = c 2 and otherwise </,*,• = — 1. Hence, expanding as in 
2.06 we get 

Po = me 2 + \mv 2 F + fm^j + • • • 
and the famous equation 

E = me 2 


for the rest energy. But we shall not use this “unnatural units” factor again. 
A clear discussion of conversions among commonly encountered systems of 
units will be found, among many other things, in [Synge]. 

2.09. Definition 2.06 Reconsidered. In Newtonian terms, mass is the 
most fundamental quantity in sight. It is the “quantity of matter” in the 
particle, body, system, ... under consideration, conserved by the dynamics 
and by any change of inertial frame. So we were led, by the Newtonian 
idea of particles as bits of mass flying about, into defining p in terms of m. 
But since m is not conserved in collisions, this is backwards. In physics, 
what is more conserved is more fundamental. Thus, 4-momentum is more 
fundamental than energy is more fundamental than heat is more fundamental 
than temperature. So we shall from now on think of particles as bits of 4- 
momentum flying about. 

More precisely, since when a particle can interact with a field its 4- 
momentum can change continuously, we think of the particle as its history 
c and a non-zero cotangent vector field p along c, such that Gj (p(cr)) is 
always a scalar multiple of c*(a). (Since a classical particle, unlike a field 
or a probability wave, does have a well defined “direction” for its motion, 
4-momentum should be “in that direction” according to G.) We can then 
define rest mass m by 

m 2 = p • p 

(Exercise 4a) and let the question of whether it is conserved depend on the 
detailed physics of the particle and the ambient field. For a re-entering space 
module it is not, for an electron it is conserved as long as the electron is. 
To call either - or anything - a “particle” is an approximation, if “particle” 
is not given a new meaning. Relativistic quantum field theory may provide 
such a meaning. 

2.10. Zero Rest Mass. In 2.06 we were restricted to a timelike history, 
because only on this had we a unit forward tangent vector as 4-velocity. So 
the history of a “light particle” or photon following a null curve c, however 
parametrized, will have 
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m 2 =p P= <n(c*(t)) • G a (c*(0) = C*(t) ■ c*(t) = 0 , 

zero rest mass. We know photons have momentum - light pushes things - 
but there is no frame-independent way of assigning them an m that makes 
the old mv definition work. 

Suppose a zero rest mass particle “slows down” onto a timelike curve 
from its initial null world line, to go at less than the limiting speed, while 
maintaining the zero rest mass character that is part of its identity. (If a 
gamma ray “slows” into an electron-positron pair then rest mass appears, 
but we consider that the gamma ray disappears.) It then satisfies 

p=OG l (c*(t))=0 

(Exercise 4b). Having neither energy nor momentum, it no longer exists. So 
a zero rest mass particle can only travel at the limiting speed. 

(Photons, incidentally, are not slowed down in their character as electro¬ 
magnetic or probability waves by air, glass or whatever non-vacuum trans¬ 
parency they meet: only their group velocity is. See [Feynman] for an excel¬ 
lent account of this distinction.) 

Consider now a zero mass history c, with the wave aspect of the particle 
whose motion we wish to describe approximated around c(t) by a plane wave 
(cf. 2.05). Its 4-momentum p at c(t) is the linear functional with the contours 
shown (Fig. 2.3). Notice that this pattern does correspond to something 
moving in the direction c*(t): an observer passing through c(t) with timelike 
4-velocity t* (choice of 6 shown) will perceive wavefronts moving to the right 
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at limiting speed. As for any functional, its timelike component po is given by 
its value on the timelike basis vector, here t,*, and its spacelike components 
Pu P 2 , P3 by its values on the chosen spacelike vectors (s*)i, (s,)2, (^1)3 
orthonormal to i,-. (One represents these three for each i in Fig. 2.3, for 
dimensional reasons.) So the observed energy E is just po, and the size p of 
the observed “ordinary momentum” has p 2 = p\ + p 2 + p|. Thus E 2 = p 2 
necessarily, since p is a null covector. 

The energy E can be seen to be the number of wavefronts cut by a unit 
timelike vector (this is frequency ). Likewise, p is the number of wavefronts 
cut by a unit purely spacelike (to the observer) vector in the direction of 
travel (this is wave number , 1/wavelength, up to sign). 

E and p do correspond to the energy and momentum, as measured in 
the inertial frame F with F eo = i,*, transferred from emitter to receiver. 
As can be seen, an observer fleeing the emitter will report lower energy and 
lower wavelength, an observer advancing on it will report the reverse. (These 
effects are the well know red and violet shifts respectively, so called because 
the colours are the low and high energy ends of the visible spectrum. Together 
they are called the Doppler effect) 

Exercises XI.2 

1. Prove Lemma 2.03. Does this result apply to an accelerating observer, 
measuring distances by his “instantaneous inertial frame” F given by 
F e 0 = c*(<r ), or does it require corrections for this case like the time 
dilation formula (2.02)? What lengths would he assign to an inertially 
moving stick? 

2. If F is an inertial frame, show that for an orthonormal basis e 0 , e ly 
®2, 63 with eo = F eo and an affine chart X — ► R 4 giving g x (di) = e,-, 
c*(a) has the component form given in 2.04. 

3. What is the definition of a “perfectly elastic” Newtonian collision? 
Define a perfectly elastic relativistic collision without referring to an 
inertial frame. 

4. a) Show that p defined as mG|(c*((r)) satisfies p • p = m 2 , if c*(<r) is 

timelike. 

b) Show that m defined as y/p • p satisfies p = mG± (c*(<r)) if p is timelike 
and c*{a) is a scalar multiple of Gf(p). 

5. Suppose two zero rest mass particles travelling the same null curve 
have energies E\ y £3 at some point, as measured in a frame F. Show 
that their energies £J, £3 relative to another frame F f give 

E[ = E!> 

E\ E 2 

(Red shifts are ratios, independent of energy, frequency, wavelength.) 
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3. Fields 

Matter does not really come in particles of zero size. In Newtonian mechanics 
the centre of mass of a rigid body moves under gravity like a particle with 
the mass of the whole body, which makes the “particle” idealisation a handy 
one. Relativistically it is less reasonable: there is no way to define a rigid 
body, and the most interesting features of a zero rest mass entity are that its 
energy “is” frequency and its momentum “is” wave number, which cannot 
even be spoken of while it is regarded as entirely point-concentrated. We 
shall not refine into rigor the wave packet view used above by getting deeper 
into quantum mechanics, however. Rather, let us look at matter spread out 
smoothly, moving through spacetime, as classical hydrodynamics for instance 
considers it spread smoothly (and infinitely divisibly) through space. 

3.01. Describing a Flux of Matter. First let us mentally approximate a 
smooth flux of matter by a crowd of particles, colliding, merging and splitting 
(Fig. 3.1), following timelike or null curves. (This is a crutch towards a 
more precise concept.) Now, 4-momentum is carried along each path: thus 
we have a flux of 4-momentum forward in time along the network that the 
paths make. To know how the 4-momentum is distributed across a particular 
spacelike section is to know “where the matter is” in it. (Though not where 
each bit of matter in some earlier spacelike section “now is” - particles lose 
their individuality in an inelastic relativistic collision, just as in even a non- 
relativistic quantum one like two electrons bouncing off each other.) As in 
classical fluid mechanics, then, we wish in the smooth version to describe 
a flux. A flux of the vector quantity 4-momentum, not the scalar quantity 
mass; but let us consider for a moment the simpler problem of modelling a 
classical fluid. 



Fig. 3.1 
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Fig. 3.2 


While a description by the velocity vector field is informative (VII.6.01) it 
is not quite complete for practical purposes unless the fluid is incompressible. 
If the density q of the fluid in Fig. 3.2 is greater at the middle of the pipe 
than its value, qb say, at the boundary, we get a larger total flux than if it 
is equal to or less than qb . So the field we want should describe how much 
fluid is passing a given point, not just how fast it is going. If g(x) is density 
at x, then qv is a natural candidate field. But how, geometrically, do we use 
it in questions of “total flux”? Once again we have to get the variance right. 

To find the flux through a given surface, we must integrate the flux per 
unit area through the surface. As in X.2.01, this means that we must assign 
at each point a quantity per unit area to each plane P in T x X , which in this 
case will mean the “flux density at x through a surface passing through x with 
the attitude P”. For a flux of mass this will be a scalar for each P, summed 
up by a skew-symmetric bilinear form T x X x T x X —► R, just as curvature is 
skew-symmetric and bilinear T X M x T X M —► {“infinitesimal rotations”}. For 
the flux of a covariant vector quantity it will be T X M x T X M —► T*M . 

But this is per unit area . Fine for 3-dimensional flows, where we can 
find the net flow into or out of a region by integrating over its 2-dimensional 
boundary but in four dimensions we shall have to integrate over 3-dimensional 
hypersurface boundaries. Which would make the flow of a vector quantity a 
4-tensor. Fortunately there is a less cumbersome description of it. Rather 
than fix a plane in 3-space by giving two vectors that span it, or a hyperplane 
in 4-space by giving three, we can in both bases give it as the kernel of a 
cotangent vector, on 3-space or 4-space respectively. But if P = ker /, then 
also P = ker(A/) for any 0 / A G R. How do we choose one particular 
covector to label P? 

In the Riemannian situation we can choose by requiring / • / = 1, and 
f(c*(t)) > 0 when c with c(t) = x is a curve crossing P in what has been 
chosen as the “positive” direction for the surface: just choose / = Gj(n), 
where n is the “positive unit normal” to P, in the language of electromag¬ 
netism texts. But if G is indefinite, P may be degenerate. In this case all 
the vectors normal to P are in P, not crossing it (Fig. IV.1.5d); equivalently 
/ • / = 0 for any / with ker / = P, so there is no unit covector to label 
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Fig. 3.3 


P. (And a smooth closed hypersurface such as bounds a compact region in 
flat spacetime must have some degenerate tangent hyperplanes, by simple 
topological arguments: Fig. VII.3.7). So we must be a little more subtle. 

Instead of simply representing P, let f represent an actual piece A of area 
with attitude P: say the parallelogram fixed by two vectors u, v (Fig. 3.3). 
(In spacetime, a piece A of volume in hyperplane P, fixed by three vectors 
u ) v, x.) Then we can define /(to) as the volume (hypervolume) of the 
parallelepiped Q fixed by u, v, to (hyperparallelepiped fixed by ti, o, sc, to), 
which is well defined as G is non-degenerate on the whole space, even if P 
is degenerate. The technicalities (in particular those that fix the sign of /) 
are gathered in Exercise 1, but geometric understanding is more important 
in what follows. 

The covector / labels A in a throughly “per area” way: if A is doubled 
(say by doubling ti) so is /, but another A in P with the same area (by any 
skew-symmetric bilinear measure) as A will give the same /(to ), for any to, 
as A. If G is Riemannian, we have a well defined idea of “unit area” on P 
(say if i4, v are orthonormal, the area of A is 1: cf. X.5.01) and for A a unit 
area, /(n) = 1 (Exercise le) so in this case / is Gj(n) as before - our new 
method agrees with the old, where that one works. 

So, we label each “area (volume) in a plane (hyperplane) in T x X , with 
a choice of which way through is positive” by a cotangent vector f £T*X. 
A description of the flux of any (*)-tensor quantity should say how much of 
this quantity goes positively through such an “infinitesimal area (volume)” 
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at each x. That is, at each point x E X we should have a linear (why?) 
map taking / to a (*)-tensor in (T*X) x . That is a map T*X —► (T*A%, 
equivalently an element of L(T*X;(T}lX) x ) = (T^ x X) x . For a scalar flux 
such as mass or heat, this means a vector in T x X , since k = h = 0; actually 
a contravariant vector at x, but used as a linear functional 

T* —► R : ^area (volume) labelled by / 6 h ^flux through^ . 

In the case of classical fluid mechanics the contravariant field appropriate 
is exactly gv , where g is density and v is velocity, as we saw at first. But in 
general such a decomposition is not possible. 

3.02. The Flux of 4-Momentum. We see then that a flux of the (J)-tensor 
quantity 4-momentum should be a field of maps T x :T*X —+T£X, which are 
thus actually operators on T*X: otherwise described via the natural isomor¬ 
phism L(V; V) 2* V* <g> V as a (J)-tensor field T on X. (Not to be confused 
with T the torsion (^-tensor VIII.5.05, which is always zero in relativity 
theory as we use the Levi-Civita connection.) This is variously called the 
matter tensor for the flux of matter described, the energy-momentum tensor 
for it, or the stress or stress-energy tensor for reasons apparent in 3.04. 

3.03. Components. Choose an inertial frame (cf. 2.01) F for X and a chart 
X -+ R 4 with the ft orthonormal everywhere, ft) = F eo. In the manner of 
3.01, the covector dx° = Gq( F eo), with kernel F eo, represents unit volume 
in the 3-space of entirely F spacelike vectors at a point x. So 

T(dx°) = T$dx 0 + Tfdx 1 + T$dx 2 + Tgdx 3 

is the amount of p of 4-momentum “passing through the ^present” of x, per 
unit volume. Since “passing through the present” just means “now”, T(dx°) 
can be understood as the F density of 4-momentum at x. (Notice that this 
must depend on F. Density as “amount per unit volume” depends on “unit 
volume” which depends on “unit spacelike length” which depends on the ob¬ 
server.) Separating this into F timelike and F spacelike parts as in 2.07, we 
see that T§ then represents F energy density and (Tfdx 1 + T^dx 2 + T^dx 3 ) 
represents F momentum density , relative to F. If the matter whose motion is 
being described is a fluid or solid moving at less than the speed of light, the 
mass-energy density Tj can be seen (relative to F) as a particular combina¬ 
tion of rest mass density and density of energy stored in the elastic forces in 
the material. 

Similarly for i = 1,2 or 3, T(dx t ) = Tjdx* represents 4-momentum going 
“sideways”; the flux per unit area through the hypersurface orthogonal to 9,-, 
in a sense determined by ft. (Note the difference between “sense” and “di¬ 
rection”. For instance, “We can finally see the Midnight Sun, having crossed 
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the Arctic Circle, Northwards” refers to the sense. We might have been trav¬ 
elling in the direction East-North-East, not due North, as we crossed.) To an 
observer at F rest, then, T(dx l ) is seen as the flux of 4-momentum per unit 
time, per unit area - time and space being illusorily unglued by his intrinsic, 
local chart - across a surface orthogonal to di and at rest. Taking this apart 
as we did for T(dx°), Tj$ is the i-component of the flux of ^energy he sees, 
which is thus described by the vector (as appropriate for the flux of a scalar) 
T\d\ + + T3^3. The covector T{dx l +T^dx 2 + T^dx 3 is the i-component 

of F momentum flux. 

3.04. Self-Adjointness of T. Just as F momentum is Gj( F mass times 
F velocity), for the above idealisation of particles at less than lightspeed, one 
expects F momentum density for continuous matter to be Gj of 

( F mass-energy times F velocity) per unit volume 

which can be rearranged as 

( F energy per unit volume) times F velocity; 

namely, F energy flux. In the component forms above for F momentum density 
and F energy flux, this gives the equation 

* (7? dx 1 + T$dx 2 + T$dx 3 ) = G^d 1 + Tp #2 + • 

Since Gj(do) = dx° by construction and Gj is linear, this implies that 

(Tgdx° + Tfdx 1 + T%dx 2 + Tidx 3 ) = G^Tgdo + tfdi + Tfa + tfda) , 

T(dx°) = Gj (T*(0 O )) by III.1.06 

= GjT*Gj(<£r°) 

= T t ( dx°) (cf. IV.2.08; 

also, T is T;X -+ T;X, not T x X -+ T x X ) 

So for the unit timelike covariant vector dx°, T T has the same effect as T. 
But any unit forward timelike covariant vector could have been dx°: this 
is just a matter of labelling, not physics. So for any unit forward timelike 
covariant vector, /, 

T(f) = T r (f). 

But since we can find a basis /°, /*, / 2 , / 3 for T*X consisting only of such 
vectors, this means that for any g £T*X whatever, 

T(g) = T( gi f) = gi {Tf) = 9i {T T f) = T T g , 
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T is self-adjoint. 

This illustrates beautifully the advantages of not restricting ourselves to 
the bases developed in VII.§4, or always requiring orthogonality. Usually * 
is produced as above, and the equations 

T- = Tf for l<i,j <3 

by an entirely separate physical argument. (See, for example, [Misner, Thorne 
and Wheeler].) 

Unfortunately, both physical arguments are invalid. In fact, there is an 
extensive literature on polar materials (those with T t ^ T), beginning in 
1907. (See [TVuesdell] for an account.) However it has been the concern 
chiefly of solid state physicists, in little contact with cosmologists who are 
overwhelmingly gaseous. 

The catch in the argument for * was the purely G|(mass times velocity) 
view taken of momentum, for continuous matter as for particles. We have not 
discussed angular momentum, and cannot do so geometrically until we have 
the Lie group language deferred to a later volume. But if the particles in 3.1 
have it, they carry torque as well as pressure, tension and shear effects. Of 
course the smaller a ball is, the faster it has to spin to have a given non-zero 
angular momentum; but the absurdity of the “limiting notion” of a point 
particle spinning infinitely fast just shows that angular momentum has to be 
more subtly conceived. The dipole (a point particle with an electric field but 
no net charge) is closely analogous. Ultimately the point mass gives far more 
trouble than any other attribute of a particle (cf. XII.4); as field quantities 
dipole moment and angular momentum, properly relativised, cause no more 
paradoxes than mass-energy density. 

That being said, symmetric stress tensors are in the practical analysis 
of matter far more common . We shall return to the polar case in the next 
volume, when we examine the tensor geometry of material physics: but while 
in this one we have sophisticated language for discussing space and space- 
time, for the matter that happens in it we have really only a few pictures. So 
for now we keep to a non-polar view of matter. In the next chapter - where 
we equate T to a tensor self-adjoint for geometrical reasons - we use im¬ 
plicitly, without change of name, the symmetrised matter tensor |(T + T t ). 
This amounts to an extra physical hypothesis (usually made tacitly): that a 
density of angular momentum has no influence on the curvature of spacetime. 

3.05. Signs. As we remarked above, T{ represents force across a surface S 
orthogonal to d\. Which sign corresponds to pressure in the ^-direction, as 
opposed to tension? 

Think of the momentum carried by the particle p in Fig. 3.4 leaving one 
side (pushing back the matter q it leaves) crossing S in the positive sense 
according to du and arriving on the other side, pushing the matter r there 
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Fig. 3-4 


forward. This evidently represents pressure, acting to separate the two sides. 
The particle’s 4-velocity v has dx l (v) > 0, since this is what “the positive 
sense according to Si” means. Now, dx l (d\) = 1 > 0 by the definition of the 
dual basis, VII.4.01, 4.02, so (mG|(v)) dx 1 is positive too, since this is just 
mdx 1 ( t>). so if the particle has F energy E and F momentum pp ) and thus 
4-momentum mGj(v) = Edx° +Pf, to transfer, we have 

(Edx° +Pf) * dx 1 > 0 


and so 

Pf • dx 1 > 0 

since dx° • dx 1 = 0. So since dx 1 is spacelike, the 1-component of pp is 
negative. 

In the smooth version, T* is the 1-component of the momentum flowing 
across 5, approximating our “network of particles” picture. So we see that 
T* is negative in the case of pressure, positive for tension. 

Notice that if we used a metric of signature +2 then these meanings 
would be reversed, so the same material situation would be represented by 
minus the matter tensor appropriate with the Lorentz metric we use. But 
the fully covariant or fully contravariant forms, Tijdx 1 ® d& or T* J di ® <9;, 
would have the same sign in either version as the minus signs on G and T 
cancel. 

3.06. Principal Directions. If T x is self-adjoint and has a timelike eigen¬ 
vector f G T*X } then by IV.4.13 T*X has an orthonormal basis of eigen- 
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vectors of T x . Choose an inertial frame F by making F eo the unit forward 
vector in the eigenspace { af | a € R }, and an affine chart making do, 0i, 
^2 > 03) orthonormal eigenvectors of T x with do = F eo. The parts of the 
matrix of T x representing ^momentum density and flux of F energy vanish 
in these coordinates. F may then be considered as the “instantaneous rest 
frame” for the described matter at s, though nothing identifiable as pari of 
the mass-energy may be at rest. (For example in a solid the “solid matter” 
may be moving one way, the “elastic energy” travelling along it the other 
way relative to F, resulting in a zero net flow.) In the spacelike part too, 
the off-diagonal elements vanish. This may be seen as the reduction of the 
stresses at x to pressure or tension in the three spacelike principal directions, 
with no shear stress in the planes orthogonal to these directions. 

If T x describes a situation where the net flow of energy is at the speed 
of light (so the matter at x is present entirely as radiation, or equivalently 
as particles with zero rest mass, all going in precisely the same direction), 
it evidently has a null eigenvector. Clearly there is no frame for which the 
F energy flow is zero, so it does not have a timelike one. Then IV.4.13 thus 
does not apply, and T x turns out to be in fact not diagonalisable. This highly 
special situation, however, is the only one among all those arising for fields 
so far observed in physics for which T fails to have a full set of principal 
stresses. 

3.07. Conservation. The conservation law for 4-momentum in collisions of 
particles (2.07) essentially treats a collision as happening in a small “black 
box”: we do not know what happens in there, very often, but we insist 
that what goes in must equal what comes out. The same idea gives the 
conservation law for fields. 

Consider a small parallel-sided box Q in spacetime: the sides are four 
pairs of parallelepipeds. We represent one pair by parallelograms in Fig. 3.5. 
Evidently the flow through P 9 does not equal that through P, in general: 
that would mean for instance that if P is in the “now” of x in the frame P, 
P 9 in the “now” of x 9 , 4-momentum F density does not change between x and 
x 9 . In fact T would have to be a constant field, not just a conserved one. 
What we must require is that what gets lost (or appears) between P and P 9 
must come out (go in) through the other six sides of the box. So if we take 
the four pairs P^, P 9 , fi = 0,1,2,3 of sides of Q we want the sum 

3 

^(Flux through P^ — Flux through P^) 

m=o 

to vanish. In the limit as Q approaches zero size, the 4-momentum flux 
through the various sides “becomes” (is increasingly well approximated by) 
the values of T on the cotangent vectors f 9 * labelling the “per area and atti¬ 
tude” of the sides P^, P'. So for the limit of the above, a natural candidate 
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Fig, 3.5 


is the limit of 


3 


E 


separation of x and x ^ J 


where x is the corner kept fixed as Q shrinks and the divisors stop the sum 
becoming trivially zero as the terms above approach each other. 

To be little more careful, notice that ft* is strictly in T*X y which is not 
the domain of T x ^, and that the image of T x » is not in the same vector space 
T*X as that of T x , so they cannot strictly be subtracted. We must correct 
this by parallel transport. (In the particular case of flat spacetime we could 
go via the space of free vectors, since parallelism is independent of route, but 
let us be more general.) Then, if ^ Th is parallel transport T x X —► T Cfi ^X 
along the edge from x to x^ parametrised by the curve c^ with c^(0) = x, 
the limit becomes 

A ( »,n)) ° ,n - r.(/»A 

l * ) 

h * ) 
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Fig. 3.6 


3 

= D v ‘;(°) r )/ , ‘ - 

*i=o 

by reference to VIII.7.01, 7.02. Notice, though, that if we are to have a box as 
we go the limit, rather than some interlocking slices (Fig. 3.6) we must have 
each /** label the particular piece of area in P given by the three edge vectors 
c*(0), v ^ ft. If we so parametrise the edges that the hypervolume of the 
box in T x X fixed by the four c* (0) is unity, we have the c*(0) as a basis for 
T x X with the corresponding /** simply the dual basis for T*X (Exercise 2). 
Choosing a chart which gives these bases as d^, dx^ at x, we have 

V c .( 0) r = Vg M (7jdi <8> dx i ) 

= Tj.fi ®dx> by VIII.7.08, 

and so 

(V c .(o)r)/‘' = Contraction of (Tj. fi ® dx>) ® dx v over i and v 
= T^dx* . 

Hence, provided the c* (0) fix a unit hypervolume, 

3 

B V ‘;(0 Wf^Tf^dzi , 

fi=o 

which by X.6.12 is called the divergence of the tensor field T and is inde¬ 
pendent of the c* (0). We now have a geometric meaning for divergence in 
general: using the isomorphism 
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(T x x)i s l(t;x ; t;x ® • • • (8) r;x) 

we think of a (j,)-tensor field as describing the flux of a (^)-tensor quantity, 
following 3.01. Its divergence is thus “the amount of the (^)-tensor that is ap¬ 
pearing or vanishing” per unit volume, at each point. The field conservation 
law for 4-momentum thus takes the form 

div T = 0 

or . 

t!-, = o, 

in coordinates. 

The above reasoning, of course, did not establish this form for the law, 
but only motivated it. Of the quantities involved only the V$ M T and divT 
have been strictly defined, in the absence of the theory of integration on 
manifolds that would let us talk rigorously about the flux through a hyper- 
surface of a small but finite size. However, in flat spaces the integral of the 
divergence of a flux F of any ti>, over a 4-dimensional region f7, is exactly 
the total flux of w into or out of U through its 3-dimensional boundary. So 
as we shrink U to a point, whether or not is is box-shaped, the limiting flux 
in or out per unit volume is indeed div F. 

In a curved spacetime Af we cannot simply add “tensors per unit volume, 
or hypervolume” at different points, so things are a little more complicated. 
But we can state the following “integral version” of the conservation law. 
Suppose the energy density T x (f) • / > 0 for all x and all choices of timelike 
“rest velocity” Gj(/) at x, and that the corresponding 4-momentum ^density 
is never a spacelike covector. Then, if T = 0 everywhere on some spacelike 
hypersurface (cf. VII.2.03) S of Af, we find that T = 0 everywhere on Af, 
assuming Af itself has no singularities. So the absence of matter, at least, 
is conserved in a simple sense. From this we can deduce results on the way 
that determining non-zero T on S determines T on Af, analogously to the 
way fixing Newtonian positions and momenta at time to determines them for 
other t . 

The conditions assumed, non-negative energy and no spacelike 4-mo- 
mentum (such as would be possessed by particles going faster than light) are 
plainly necessary for this result. Without the first, we would find a solu¬ 
tion where matter fields with stress tensors cancelling (some having negative 
energy) appear together forward of S and move off in different directions. 
Without the second, matter could “come in from infinity at infinite speed”, 
following curves that stay forward of 5, and decay into ordinary matter that 
then proceeds forward in time. So in neither case would the solution, even 
for vacuum initial conditions, be unique. 

(It is in this sense, the uniqueness of solutions, that we referred to deter - 
mining T - the existence of solutions only holds locally. For example, in flat 
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spacetime X start with a matter field in a compact region of some spacelike 
submanifold S with the matter having no internal forces, ever. That is with 
Tj = 0, i,j > 0, always, in some orthonormal coordinates. This is called 
for obvious reasons a “dust” stress tensor. Suppose all the 4-velocities of 
the dust particles point at the same element Then the solution for 

T grows without bound as we approach x, and so cannot exist at x, even 
without invoking gravitation and its effect on the metric - which produces 
singularities under far less artificial assumptions.) 

Exercises XI. 3 

1. a) Show that the set of bases of an n-dimensional vector space V fall into 
two classes, such that for two bases /? = &i,..., b n and ft = 6^,..., b' n 
the linear operator a 1 hi »-+ a*has positive determinant if /? and ft 
belong to the same class, negative otherwise. 

A choice of one particular class (say, for three-dimensional “phys¬ 
ical” spaces by the right-hand rule) is called an orientation. With such 
a choice made, we say V is oriented , and a basis is called positively or 
negatively oriented according as it is in the chosen class or the other. 

b) If V is an oriented n-dimensional metric vector space, use Exer¬ 
cise V.1.11 to show that there is a unique skew-symmetric n-linear 
form Det on V (as distinct from the function det: L(V f V) »-► R) such 
that if &i,..., 6 n is an orthonormal basis, Det(&i,..., 6 n ) is +1 and 
— 1 according as &i,... ,fc n is positively or negatively oriented. What 
is the result of changing the order of the basis vectors? 

For any ordered n-tuple (t>i,..., v n ) of vectors in V , we call 
Det(t>i,..., v n ) the volume , with respect to the particular orientation 
and metric, of the parallelepiped fixed by vx,..., v n . 

c) If | Det |, the positive volume with respect to the metric, is defined by 
I Det \(vx ,..., v n ) = | Det(t>i,..., v n )|, show that it is independent of 
the orientation used to define Det but not multilinear. 

d) If V is an n-dimensional metric vector space, P C X a hyperplane, 
pi,... ,p„_i 6 P linearly independent, and v £ P, show that there is 
exactly one / E V* such that 

(i) f(v) > 0 

(ii) |/(w)| = | Det |(pi,... ,p n -i, w) for all w G V. 

Show that / depends only on v and the volume of the parallelepiped 
in P fixed by pi,... ,p n , whatever measure of volume is used in P. 

e) Suppose G is an inner product. Prove that there are exactly two unit 
vectors v with v 1 = P. If one is chosen as the “unit positive normal” 
and denoted by n, show that if pi,...,p n -i are orthonormal, the 
/ given by (d) with f(n) > 0, \f(w)\ = | Det |(pi,... ,p n -i,™) is 
exactly G?|(n), so f(w) = w - n, and f(n) = 1. 
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2. a) Show that if Det(vi,..., v n ) = 1 and we use v \,..., v,-i, v,+i,..., v n 

for pi,... ,p n -i in Exercise Id, and for v, then t>i,..., v n are a basis 
for V with / given exactly by the dual basis vector v l . 
b) Show that for any linearly independent v \,..., v n there exists a num¬ 
ber a > 0 such that | Det |(a»i, t> 2 , • • •, v n ) = 1. Deduce that we may 
parametrise the edges of the “box” in 3.07 from x to z^, p = 0,1,2,3 
so as to make 

|Det|(c5(0),... l c5(0)) = l . 

3. Find the component forms of ^energy density etc. (3.03) in unnatural 
units (2.08). 


4. Forces 


So far we have used the word “force” only in the context of an inertial 
frame, to give Newtonian analogues for the “entirely F spacelike to entirely 
F spacelike” part of the stress tensor T. Is there an invariant, relativistic ana¬ 
logue for the force concept, as 4-velocity is analogous to Newtonian velocity? 

For a matter field, the analogue is T itself. Newtonian force means 
flow of Newtonian momentum (Exercise 1), and T is exactly the flow of 4- 
momentum. (This point of view suggests thinking of the classical stress tensor 
as a single “force”, which is entirely reasonable. Newtonian 3-space no more 
has innately given coordinates than does flat or bent spacetime, so the tensor 
is more fundamental than the set of components it has in some chart, which 
are interpreted as shear forces, etc. Thus even Newtonian “force” graduates 
from a (J)-tensor to a (})-tensor.) 

For a particle, the natural candidate for a relativistic or 4 -force on it is 
“rate of change of its 4-momentum along its world line”. (The Newtonian 
F = ma is really a definition of force more than a law of physics.) In a 
general spacetime, this rate of change should evidently be defined by covariant 
differentiation. Consider a history c with constant rest mass m, parametrised 
by proper time <r. Then 

p(<r) • p(a) = m 2 , 

constant, so 

V c *(p p)(o-) = o , V<r. 

Therefore 

V c *p P + P v c .p = 0 , 


by VIII.7.03, 7.06 (just as for contravariant vectors). Hence, 


V c *p p = 0 , 

by the symmetry of G *. 
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Thus for such a history the 4-force must be always orthogonal to its 
4-momentum. Since no one non-zero cotangent vector at x 6 X can be 
orthogonal to Gj(c*(<r)) for all curves c with c(<r) = x, the 4-force on the 
history must necessarily depend on its 4-velocity, or vanish. 

This recalls the way that the Newtonian force on a charged particle in 
a magnetic field depends on its velocity; this indeed is the spacelike part of 
a relativistic example. The electromagnetic field is geometrically a 2-form, 
or skew-symmetric (^-tensor field, on spacetime. (See, for example, [Misner, 
Thorne and Wheeler].) Contract this with the (J)-tensor e(c*(<r)), where 
e £ R is the charge on the particle, and th$ result is a (J)-tensor, the 4-force. 
This is the simplest possible relativistic “field of force”. We must have a 
map taking 4-velocity (or 4-momentum) to 4-force, so we must have a tensor 
of total degree at least 2. (4-force might depend on other things, beside 
4-velocity: we have shown only that it must vary with that at least, if we 
are to have particles with constant rest mass.) The Newtonian vector field 
of force, typified by the electric and gravitational fields, whose effect on a 
particle depends only on its position and perhaps a scalar such as charge, is 
impossible. 

Electromagnetic forces thus “relativise” beautifully into special relativis¬ 
tic language (and indeed are simplified by it). In fact the group of Lorentz 
transformations was discovered before the notion of spacetime or the Lorentz 
metric, as exactly the transformations that left Maxwell’s laws invariant. 
The behaviour of this field - in particular, of electromagnetic radiation, light 
especially - played a crucial role in the origin of special relativity. 

What about the other great force field of classical physics: gravitation? 

It cannot be a (J)-tensor field as in Newton’s theory, let alone take 
the ultra-convenient form d<f> for : X —► R, as long as we suppose m 2 
fixed for each particle, for the reasons above. Letting rest mass vary leads 
to worse confusion. And it turns out that no effort to describe gravitation 
as a higher-order tensor field on flat spacetime has succeeded, either. All 
attempts have either broken down on inconsistencies, internal or with the 
facts, or made the flatness of the underlying space physically undetectable 
since no physical quantity is described as travelling by the parallel transport 
of the flat connection, which thus drops out of sight. Nor is a purely “force 
field” theory of gravitation greatly to be expected, as we see in the next 
section. 


Exercises XI.4 

1. In Newtonian terms, the force between two bits of matter is the flux 
of momentum between them: the net force on one bit is the net flux 
between it and all others. 


7*lO ix*. 7^a£/Le#fui£Zciz L PAyAtcJ. 



5. Gravitational Red Shift and Curvature 


369 



Fig. 4.1 


a) Describe qualitatively the flux of momentum along the parts of the 
object in Fig. 4.1a, with no external forces and moving at constant 
velocity with no rotation. 

b) Draw in a longitudinal cross-section the flux of each of the three com¬ 
ponents (in the obvious (z, 2 /,z) coordinates) of the circular beam of 
Fig. 4.1b, at rest with equal and opposite axial forces pulling at the 
centres of its ends. Notice that a flux of a quantity round something 
need not change the quantity’s density at any point (a), even if that 
density is zero (b). 

c) Describe these two situations relativistically. 

5. Gravitational Red Shift and Curvature 

Suppose that spacetime is flat, and consider the gravitation due to an inhab¬ 
itable ball B of matter at rest, everywhere, in some inertial frame F. Let 
L and U be experimenters at F rest, L on the surface of B, U an ^distance 
directly above L. Suppose L has a perfectly efficient machine for turning 
rest mass into a tight beam of radiation: U has an equally efficient device for 
turning radiant energy into rest mass. L starts with a supply m of rest mass 
at F rest, which he turns into radiation and beams up to U: she restores it to 
matter and drops it back to him. If the amount of rest mass U reconstitutes 
is the same amount m that L started with, he gets back his original invest¬ 
ment of mass/energy plus the kinetic energy gained by the mass in its fall. 
He can use this bonus to run his sewing-machine while beaming m rest mass 
back up to U for her to drop again, ... etc. L and U would have between 
them a perpetual motion machine. Even if their separate machines were not 
perfectly efficient (indeed, all such devices known are so far from it that L 
and U would have a net loss of useful energy; L would do much better to 
point his beam at a boiler) the arrangement would violate conservation of 
mass/energy, as measured in the frame F. 

Rather than believe this, it is natural to suppose that not as much mass/ 
energy reaches U as left L\ the radiation reaches U with less energy than it 
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Fig. 5,1 



Fig. 5,2 


had when it left L . This decrease is indeed experimentally observed, and in 
the right quantity, so conservation of mass/energy is inviolate. 

Thus far, we could still think of gravity as a “force” against which the 
radiation does “work” in rising, and so loses energy. But for radiation, energy 
is proportional to frequency (2.09), so it must arrive with lower frequency 
than it left. This is the gravitational red shift , because it moves light towards 
the red end of the spectrum. It makes flat spacetime seem very unphysical, 
by the following argument, due to Schild. 

Assume L and U in positions as before, following world lines wl , wu at 
j^rest one above the other (Fig. 5.2). Suppose L beams upward a continuous 
signal whose frequency he measures as i/. L receives a continuous signal 
whose frequency she measures as v *. Consider the track in spacetime of one 
wave crest emitted at ei, received at n, and another crest going from e 2 
to 7 * 2 , emitted N troughs later. (Or the analogue to crests and troughs for 
transverse radiation.) If experiments are repeatable and spacetime has an 
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affine structure, then the vectors d(ei,7*i) and d(e2,7*2) given by it must be 
equal. So the straight line S through e 2 and 7*2 is parallel to S\ through e\ 
and r\. 

Now the world lines wi and wy are both at F rest, and hence parallel, by 
assumption. So we have a parallelogram eirir^. But the length of the side 
eie 2 is measured by L as N/v (N periods of radiation with frequency v) and 
that of rir 2 by U as N/v f , which is greater, as 1 / < v by the gravitational 
red shift. But a parallelogram with unequal opposite sides is as impossible in 
Minkowski space as in a Euclidean space (Exercise 1). Hence either there is 
curvature inside the quadrilateral fixed by the four world lines, or if the metric 
is flat at least one observer is measuring his proper time by something other 
than arc length. The metric given by measurements is essentially curved. 

In the case of the metric we used for discussing mirages (Exercise IX.4.6, 
Exercise X.1.6) other experiments were possible, giving physical meaning to 
a metric for space without the curvature we found for k 2 G. For spacetime 
no such experiments have been found: if there is an underlying flat metric it 
is lying very low. 


Exercises XI.5 

1. Let X be an affine space, and Si, S 2 , 7\, T 2 be one-dimensional affine 
subspaces of X such that Si, S 2 are parallel translates of Ti, T 2 re¬ 
spectively and each S,- fl 7} consists of exactly one point E X. 
Show that d(pu,pi 2 ) = <*(P2i,P22), <*(Pn,P 2 i) = <*(P 12 ,P 22 ) and de- 
duce that for any constant metric tensor on X, opposite sides of a 
parallelogram have equal arc length (however parametrised). 
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Ndgaaena: Well, O king, will sticks and clods and cudgels and 
dubs find a resting-place in the air, in the same 
way as they do on the ground? 

Milinda : No, Sir. 

Ndgaaena : But what is the reason why they come to rest on the earth, 
when they will not stand in the air? 

Milinda: There is no cause in the air for their stability, and 
without a cause they will not stand. 

The Questions of King Milinda 


1. How Geometry Governs Matter 

Aristotle and Newton held that things fall because they are pulled to the 
earth; Nagasana the sage and Einstein, that they fall because nothing stops 
them from falling. The difference is a profound one. 

We saw at the end of last chapter the difficulty of describing gravity 
as a force in a flat spacetime. In this chapter we see that once we bring in 
curvature we do not need to call it a force at all. Newton’s first law, stating 
that a particle moves on a straight line in space unless a force such as gravity 
acts on it, is replaced by the principle that it follows a geodesic in spacetime 
unless a 4-force acts on it, and gravity is not a 4-force. It is just the shape 
of space, which determines the geodesics. 

1.01, The Equivalence Principle. Einstein’s equivalence principle is often 
stated as: Experiments in a closed box cannot distinguish between the box 
being in a gravitational field, not changing with time, and its being under 
uniform acceleration in a flat spacetime where no gravitational forces act. 

Thus formulated it is obviously false: in Fig. 1.1 falling objects converge 
on each other, in a way barely affected by their own masses, inconsistently 
with acceleration of the box. We shall refine it in several stages to find a pre¬ 
cise and tenable statement. First, evidently we should consider a sufficiently 
small box for such effects to be undetectable. 

In the same way, if spacetime is curved in the metric given by distance 
and time measurements, as we have seen it is, its curvature will be there 
inside a small box to show that it is not in flat spacetime. The distortion 
involved in making flat maps of spheres is there even in a plane mapping of 
the surface and boundary of a duckpond. But the errors in measurement in 
mapping a pond are sufficient to conceal its non-planar character (indeed the 
wind- or duck-provoked variations in it swamp its sphericity) so we may for 
suitable local purposes treat it as flat. We do the same with a small piece of 
spacetime. 
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Fig, 1,1 


More technically, choose a chart <j> : U —► R n around any point x in 
spacetime M such that the resulting r x - k all vanish at x. (Use normal coordi¬ 
nates around x\ IX.2.05, 2.06.) Their vanishing for all y £ U would mean U 
was flat, so we cannot arrange that in general. However, they are continuous, 
so we can choose U small enough to make all the rj k smaller than any given 
£ > 0. If e is given by the lower limit of our ability to detect curvature using, 
say, geodesic deviation, then the result is a region U that we may cautiously 
consider “flat for practical purposes”. Cautiously: we could sew all of M up 
out of such “almost flat” pieces, and the result need not be flat. Consider a 
Buckminster Fuller geodesic dome; either the short segments must be curved, 
just a little, or the faces bent a little away from each other along the edges 
(otherwise the dome is a plane). Arguments from the equivalence principle 
must be strictly local to have meaning. 
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The chart <p with its “flat for local practical purposes” domain U is 
often called a local Lorentz reference frame . Since “Lorentz” is being used 
to mean flat it would perhaps be less misleading to call it approximately 
Lorentz. Reasoning aimed at exact results about events in a local Lorentz 
frame should never use the assumption that its domain is exactly flat. 

Take then a box in f7, sufficiently small that the measured accelerations 
due to gravity at different points are “parallel”, with the ^-coordinates of 
its corners constant in time. (“Parallel” has a route-in dependent meaning in 
U up to our limits of measurement, by assumption.) Let V be the open set 
interior (at various times) to the box and ip = </>\v, to give us a chart ip : 
7-+R 4 around x . Then the equivalence principle asserts that no experiment 
confined to V by an observer A whose world line is given by ip as “rest in 
the box”, with the above limit to its accuracy, can distinguish his situation 
as being 

(1) That of an observer in a spacetime X he cannot in V distinguish 
from flat, with a velocity in X he cannot distinguish from constant, 
with a field present that acting alone would produce an acceleration 
of any matter in V parallel to its effect at any other point in V 
(though it is not acting alone, at least on him; some other agency 
is balancing it to keep his velocity constant). 

or (2) That of an accelerated observer in a flat spacetime (with some force 
such as a push from the floor of the box accelerating him), with no 
gravitational effects from outside U . 

Correspondingly, it says that an observer B not experiencing any forces 
from the walls, floor, acceleration couch etc. of the box by standing, hanging 
or lying on them cannot distinguish by similar measurements whether his 
situation is 

(I’) That of a non-inertial observer in a flat spacetime where some field 
is acting that, in the absence of other forces, would give all matter 
near him an acceleration parallel to his own. 

or (2’) That of an inertial observer in a flat spacetime where no such field 
is acting. 

(These forms of the principle are equivalent, since a permissible experi¬ 
ment for either A or B is to build a B or an A and have it report back.) 

By our method of construction ip , (1’) cannot be distinguished from 

(1”) That of an accelerated observer in a spacetime perhaps not flat, 
where some field is acting to accelerate him and, in the absence of 
other forces, matter near him, in a smoothly varying way. 

so we reach the indistinguishabily of (1”) and (2’). 


Oix*. 7^oi4e##ia£liia 



1. How Geometry Governs Matter 


375 


The approximations involved vanish when we take the limit as the size of 
the box goes to zero, if all the “sufficiently small’s” are defined carefully. The 
principle then says that an observer falling freely under gravity in any space- 
time finds the same local physical laws - that means the same relationships 
between the values of sets of measureable quantities and their derivatives, at 
the points he actually passes through - as an inertial observer studying the 
behaviour of similar quantities in flat spacetime, in the absence of gravity. 
What happens must be independent of the chart, of course, so in the general 
spacetime we must use some connection to get well defined derivatives. The 
“natural” choice, in the strong sense outlined in VIII.6.09, is the Levi-Civita 
connection for the metric given by measurement. In flat spacetime, of course, 
covariant differentiation is the same as ordinary differentiation. (Which for 
affine coordinates is given by just differentiating components, since the rj k 
vanish.) Thus we have lost the various motivating travelling boxes, observers 
etc. and come to the invariant statement: 

The local (differential, as distinct from integral) forms of the 
laws of physics are identical in the “presence” or “absence” of grav¬ 
ity - no “gravitational force” need be allowed for - provided covari¬ 
ant differentiation is used. 

This is the principle’s precise form, and any imprecisions in our earlier 
formulations can be resolved by reference to it. It is a vital tool in finding 
general relativistic forms for physical laws already studied in flat spacetime, 
often used in component form (1.02). 

The equivalence principle is not, of course, a necessary geometric fact 
like the conservation equation div E = 0 (X.6.12), but a scientific hypothesis 
to be tested. It currently seems impossible to describe gravitation without 
curved spacetime, but the principle asserts that gravitation consists only 
in the relation between matter and the curvature of spacetime. It is quite 
legitimate to suppose that there is a tensor field involved as well, associated 
with all matter as the electromagnetic field is associated with charged matter. 
This would make gravitation more complicated, since we cannot get rid of 
the curvature aspect, but perhaps it just is that complicated. For instance, 
differentiating with anything but the Levi-Civita V amounts to using V plus 
a (^-tensor field, by Exercise VIII.6.2. 

See [Misner, Thorne and Wheeler] for a discussion of experimental tests 
of the equivalence principle. Here we shall assume it is true, since by the 
above it gives the simplest account of gravitation, and we investigate its 
consequences. 

1.02. Components. In flat spacetime with any affine chart, the di are all 
parallel fields and the rj k vanish everywhere. Hence the components ;T? 

of the covariant derivative of a (J)-tensor w reduce to the derivatives ^ 

of the components (cf. VIII.7.08), and differentiation can be treated in an 
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entirely component-by-component fashion. For this reason the equivalence 
principle is sometimes stated as “semi-colons must reduce to commas in the 
case of flat spacetime”. 

(Note that with “curvilinear coordinates” in flat spacetime, the di are 
not parallel and the rj k thus do not vanish (Exercise 1).) 

1.03. Free Fall. The simplest law of physics in flat spacetime is that a his¬ 
tory c, on which no 4-force acts, has constant 4-velocity. This is actually a 
trivial consequence of the definition of 4-force which we made, being condi¬ 
tioned by Newton to seek a force as cause for any change in velocity. (Aris¬ 
totle, by contrast, held that a force is need to maintain velocity, which is 
closer to everyday experience. Only Newtonian relativity - the idea that any 
velocity can be chosen as “rest” - makes the newer idea intuitive.) 

The local form of this law is (VTII.3.05), 

V c *c* = 0 . 

So the equivalence principle generalises “a particle moving under no force in 
flat spacetime travels along an afflne straight line” to “a history in a general 
spacetime, influenced only by gravitation, is a geodesic”. 

The movement of stars and planets has thus become less “forced” in our 
minds over the centuries. Mediaeval descriptions had the planets mounted on 
revolving crystal spheres, mounted on revolving spheres, mounted on ... etc., 
driven by some ultimate Primum Mobile (prime mover, somewhat identified 
with God) which supplied the Aristotelean force to keep them going. Newton 
had them falling freely around the sun, with no push on them; the only 
force active was universal gravitation. Finally, in general relativity even that 
constraining force disappears and the planets simply take their own course, 
not straying, at one with the geometrical Tao of spacetime. 

From this point of view, then, an observer lying in a hammock is, exactly, 
an accelerated observer. The only force acting on him is the upward push of 
the hammock, just as in Newtonian mechanics without gravity a stone in a 
sling is accelerated inward away from its inertial movement, until the sling 
is released and the stone flies off tangentially on a straight path at constant 
speed. “Straight” now becomes “geodesic” and the absence of gravity is not 
required. Things fall, not for a cause, but since without a cause they will not 
stand. 


Exercises XII. 1 

1. Does the ;h+, form 1.02 for the equivalence principle hold up if the 
chart used on flat spacetime is not afflne? Compute the rfj for the 
usual connection on R 2 in polar coordinates. 
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2.01. Einstein’s Equation. We have decided that an apple falls because it 
is guided by the shape of spacetime. But why is this shape, around the earth, 
such that so many timelike geodesics meet the ground twice? (The past and 
future of a travelling golf ball both usually touch grass.) What form does 
the relation between the presence of matter and the curvature of spacetime 
take? 

First, we observe that the nature of the matter seems unimportant. 
Whether the matter is charged or uncharged, matter or antimatter, solid 
or gaseous etc., has no influence on its gravitational effect as far as any 
experiments have indicated. In Newton’s theory only mass was important; 
relativistically mass is inextricably mixed with energy and momentum. So the 
natural hypothesis is that however high or low the order of tensor needed to 
describe any “matter field”, its interaction with the geometry of spacetime 
depends only on the concominant flow of 4-momentum; that is its stress 
tensor. 

Secondly, the curvature of spacetime is clearly non-zero even at points 
where no matter is present. One can see this by the Schild argument of 
XI.5 or, once gravitation is assumed to be entirely a curvature effect, by 
the “tidal stresses” associated with geodesic deviation in X.4 or by the more 
general considerations of X.3.02. Now the purely geometric argument of 
the latter (the others appeal to experimental evidence, available only for our 
spacetime) arises only when a spacetime of at least three spacelike dimensions 
is considered, as we saw. With only two, matter could affect the geometry 
outside the histories of solid bodies without imposing curvature there, as in 
Exercise X.3.2; we could describe a “gravitation” that involved curvature only 
at its source, the matter. (Though whether intelligent creatures in a three- 
dimensional spacetime could ever find such a theory valid is another question. 
Exercise X.3.2f suggests that a star could not have planets in stable orbits 
around it, and g casts doubt on whether stars would even form. In this case 
there could be no “life as we know it” to consider the theory. Thus flatlanders 
would either be very different from us or find that whatever they used for 
gravity could be detected by purely local effects - “local” being bigger than 
V of 1.01 - away from its source, like ours with red shifts and tidal effects. 
Compare the end of [Misner, Thorne and Wheeler], on “biological selection 
of physical constants”.) 

Now in a 3-manifold curvature is completely determined by the Ricci 
tensor, by X.§7. As we go to four dimensions, more possibilities unfold: 

(1) The Weyl tensor, that gives the “non-Ricci” part of the curvature, 
no longer vanishes identically. 
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(2) The “spanning surface” argument shows that for matter to bend 
space around it, R cannot vanish wherever matter isn’t. 

This suggests that we make the Ricci tensor the “locally determined part” 
of the curvature, whose value at x is determined by T Xi and leave the Weyl 
tensor as the “non-locally determined part”, influenced here and now by the 
presence of the sun’s matter at a point 93 million miles and eight minutes 
off in spacetime by our usual labels. (Only suggests of course - we are mo¬ 
tivating, not deriving, Einstein’s equation. Similarly, Maxwell’s equations 
come not from deduction but from Maxwell. Equations you can prove are 
either laws of geometry, not physics, or mere consequences of more funda¬ 
mental principles. Such a derivation is possible for Einstein’s equation. For 
instance, from very little more than physically reasonable symmetries plus 
the hypothesis that only geometric effects appear in gravitation it is done in 
[Hojman, Kuchar and Teitelboim]. This work ought to be intelligible to a 
reader of this book who is familiar with Hamiltonian dynamics.) 

The simplest idea, then would be to equate the Ricci tensor (adjusted to 
have the same variance) to the stress tensor. But by X.6.11 the divergence 
of R is diZ, which can only be zero if R is constant. Since R = T would give 
iZ = 0 where there is no matter, this and the conservation law divT = 0 
(brought over from XI.3.07 by the equivalence principle) would imply trT 
equal to trR equal to R equal to 0 everywhere. But T can very easily, 
physically, have all its diagonal entries (energy density and three pressure 
terms) positive in some coordinates, which gives trT > 0. 

Thus it is not physically plausible simply to equate R to T. However, the 
Einstein tensor E introduced in X.6.12 has identically vanishing divergence, 
always, and by X.6.13 determined the Ricci curvature. So the equation 

(1) E = 8 icT 

(8tt being purely a convenience to simplify units, like Air in Maxwell’s equa¬ 
tions) both describes a local effect of matter on curvature, and implies the 
conservation law 

div T = 0 . 

(1) is the original Einstein 9 s equation . It is the most general relation 
possible between T and curvature that implies the conservation law (E being 
essentially the only (J)-tensor with zero divergence constructible from R) 
except for the modification 

(2) E = 8*T + AI 

where A E R and I is the identity tensor field T*M —+ T*M. Einstein 
transferred his affections for a while to this, as we see in 2.02, but we shall 
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call only (1) by the name “Einstein’s equation”. Note that either version is 
often referred to in the plural, because it is represented by sixteen equations 
in coordinates. 

2.02. Static Solutions? Let us look for a solution where spacetime M can 
be sliced into spacelike hypersurfaces S (cf. VII.2.03), each containing only 
dust at 5 rest. That is, if we choose around x a chart with the timelike vector 
do orthogonal to S we should have T x given as a matrix with the “energy 
density” Tff as its only non-zero entry: no “energy flow” and no “internal 
forces”. If we also require the di orthogonal at z, then (Tq) x subject to this 
“ 5 rest” condition for the chart is well defined, giving a function p : M —► R 
with ( Tq) x = p(x). Einstein’s equation gives, by X.6.13, 

R = E - l(tr E)I = 8x(T - ±(tr T)l) 
so using the above chart at x we see that 


iio = 8 t(Tq — %Tq) = 4t/i , since tr T = Tff 

R\ — 87r(0 - |Tj) = -47r/i , i = 1,2,3 (no sum). 

More realistically, let the sections contain gas at 5 rest with pressure p, so 
that Tq = p, T* = = —p, off-diagonal terms vanishing (cf. XI.3.05). 

Then 


R° = 8 w(T° - 1 tr T) = 8*0* - §(/i - 3 p)) = 4*(/i + 3p) 
Rj = 8*(77 - 2 tr T) = 8*(—p - 3p)) = 4*(p - /j) 


in the same coordinates at x € M. 

Now, Einstein, convinced that the heavens endure from everlasting to 
everlasting and seeking to approximate the thin scattering of matter observed 
in the universe by such a dust or gas, wanted a static solution. That is, like 
the above dust or gas ones with the further condition that M can be expressed 
as the product manifold S x R, for a particular model S of space, and that 
the maps 

(:SxR-*SxR:(i,r)K (*,* - r) 

preserve all geodesics, etc. for all t 6 R. This would allow spacelike hyper¬ 
surfaces of the “constant” form S x {t} C M, t £ R. 

However, this implies that the Ricci curvature in the direction of the 
“static” timelike vectors is zero (Exercise 1), and hence that = 0 in the 
above coordinates at x E M. So either no matter is present, in the “dust” 
solution, or the “gas” solution is under tension (the second contrary to 
observation, the first contrary even to observers), or the solutions are not 
static. The only way this can be, if the matter is at 5 rest in each 5, is by 
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differences between the sections S themselves. If the matter is evenly spread 
in 5 it has a finite size (cf. 2.03) and this this is increasing or decreasing with 
time - or perhaps just changing from increase to decrease. (Compare the 
way the North-South curvature of the Earth “forces” variation in the size of 
parallels of latitude.) 

Einstein found this behaviour of the part of the solutions so reprehen¬ 
sible that he put a fudge factor in his equation to prevent it; he inserted 
the “cosmological constant” A to keep the cosmos constant (equation 2 of 
2.01 above). If the density and pressure of the gas is nicely tailored to the 
cosmological constant of the host spacetime by arranging 

+ 3p) = A 

everywhere, then 

= (8ttFq + A) - |( 87 r tr E + 44 ) since tr I = 4 
= 4w(p + 3p) — A 
= 0 , 

so the heavens can endure. Of course, with the new equation, if A / 0 an 
empty universe must expand or contract. The universe requires just the right 
amount of matter to keep it steady. 

Notice that 4w(p+3p) must be constant over M for this to work, because 
for A a function on M we have div(AI) = dA (Exercise 2), so the equation 
does not guarantee div T = 0 if A is not a constant. In fact energy density 
and pressure must be constant individually, for the following reason. 

The “static” requirement above implies here that parallel transport by 
V M of a vector tangent to Sx{t} along a curve in S x {t} keeps it tangent to 
S x {i } (in contrast to Euclidean parallel transport of tangent vectors around 
an embedded sphere, for example). It follows immediately that the Riemann 
and Ricci tensors of S x {t} with the induced metric are exactly given by 
restricting the Riemann and Ricci tensors of M (false for the sphere in R 3 ). 
If the 5 spacelike part of T is isotropic, which we have assumed for both the 
“dust” and “gas” solutions (specifically, taking it as 0 and pi respectively) 
then so is the s spacelike part of the Ricci tensor of Af, using Einstein’s 
equation either with or without the cosmological constant. Hence the whole 
of the Ricci tensor of S x {t} is isotropic. It is therefore constant, by X.6.15. 
(It was in this physical context, or point isotropy of matter at rest implying 
global homogeneity, that Einstein manifolds first came to attention; hence the 
name.) Its components, R\ = (4ir(p—p)—A) in coordinates as above, are thus 
constant too. Combining this with the constancy of Air{p + 3p), demanded 
above, and p and p must be constant over each S x {£}. Constancy over M 
follows easily by the conservation law. 
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This result, that if the pressure and energy density of a gas are not 
everywhere the same the situation is not static, would not seem intuitively 
obvious. But intuition unsupported, even of the great, can be wrong: Ein¬ 
stein’s intuition of the stability of the universe made him avoid predicting the 
recession of the further galaxies, and the consequent, now famous, red shift 
that Hubble observed some ten years later. Subsequent measurements show 
that if A is non-zero then it is very small indeed: we shall take it as zero. 

2.03. The Shape of Space. Looking at the matter of the Universe, we see 
it getting less dense - the further a galaxy is from us, the faster the distance 
between us and it is growing. So distances on a spacelike hypersurface S 
between galaxies at 5 rest (to the approximation involved in treating matter 
as a thin gas) are increasing, rather like distances on an inflating balloon. 
Can we say that the spacelike sections as a whole really are becoming larger 
with time, that “the universe itself is expanding”, rather than that matter is 
spreading out in infinite space? 

On certain assumptions, yes. Namely, assume that there is a spacelike 
section S passing through us with the same sort of complete homogeneity, 
averaging on a large enough scale, as the sections of the static solutions 
in 2.02. (Similar relations between “isotropic” and “homogeneous” apply, 
but more complicated since spacetime is not “flat in the time direction”.) 
This is sometimes called the Copemican principle , by analogy with the way 
Copernicus dislodged Earth from the centre of things, then the sun became 
an average star off-centre in the galaxy, and finally our galaxy was seen 
as just an average member of an average galactic cluster. The existence 
at all of a spacelike hypersurface S for which all matter is more or less at 
5 rest is quite a strong assumption, and the transition from “we are nowhere 
special” to “there is nowhere special” is a little suspect, particularly as it 
leads eventually to the conclusion (cf. 2.04) that there are spacetime points 
in our past and our future that are quite drastically special. However, we can 
see quite far these days with various devices, and what we see is homogeneity 
(allowing for the way signal delay shows us earlier, denser spacelike sections) 
over a very substantial volume of space. On the available evidence then, the 
Copernican principle is a plausible assumption. 

Now, using the values for R $ and i?J- of 2.02 of the “gas” solution in 2.02, 
we get sectional curvatures for the (i, j)-planes, i, j > 0, of 4w(p — ^) (Ex¬ 
ercise 3). Now in these circumstances, where the matter is so thinly spread, 
p is very much larger than p; recall the c 2 term (XI.2.08) involved in non- 
geometrised units. (One microgram of hydrogen in a cubic metre represents 
H-bomb amounts of energy but a pressure needing fine instruments even to 
detect.) So the spacelike sectional curvatures are everywhere the same neg¬ 
ative number, on 5. Since our metric is negative on spacelike directions, 
reversing the sign of scalar curvature, this means that the spacelike sectional 
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curvatures of M at points in S correspond to positive scalar curvature for 
the positive definite situation of the surface in X.§3. M is bent in spacelike 
directions in the manner of a sphere, rather than a flat space or a saddle 
shape. 

Now it is a fact that only certain Riemannian manifolds are candidates 
for S , given constant curvature of this kind, up to a scalar constant mul¬ 
tiplying the metric. These are the sphere S 3 and various “smaller” spaces 
constructed from it by identifying points. For example real projective 3-space 
RP 3 (the 3-dimensional analogue to Exercise IX. 1.3) can be constructed by 
identifying opposite points of S 3 , or as the group of rotations of Euclidean 
3-space. The latter construction gives an easy way to specify further iden¬ 
tification. For example, identify rotations A and B if A = B o C, where C 
is a symmetry rotation of the dodecahedron. Or if A = B o C where C is a 
rotation of ^ about a given axis or ... .All such constructions give candi¬ 
dates for 5, since they are locally just like S 3 - which is what we know about 
S - and only spaces obtained by such identifications of points in 5 3 (not 
always going via RP 3 ) are candidates. Their classification (essentially that 
of finite groups acting suitably on S 3 - of which there are infinitely many) is 
outside our scope. However, in each case S has a meaningful finite “circum¬ 
ference” and “volume” deducible from the local value of the scalar curvature, 
and so is “finite but unbounded” in the classical phrase of Einstein. It has 
finite “size” but no boundary. (The timelike aspects of curvature then imply 
that this “size” cannot be constant from spacelike hypersurface to spacelike 
hypersurface unless A is just so, as we have seen.) 

We cannot prove global results of this kind in this volume, so we only 
mention further the fact that the isotropy/homogeneity on S can be weak¬ 
ened, as one would hope: if it had to be exactly true it would have little 
physical relevance, as the matter we see is not exactly a uniform thin gas. 
If sectional curvatures on a Riemannian manifold S vary between positive k 
and K > k, we can change scale to make the upper bound 1, and set 6 = 
(Then 0 < 6 < 1 curvatures < 1 and 5 is called a 6-pinched manifold.) The 
question of how small 6 can be and leave intact the topological conclusion 
that the space is S 3 (with perhaps some points identified) is a topic of active 
mathematical research. In the case dim S = 3 the latest, smallest value for 
which we have heard of a proof at the time of writing is Evidently 6 must 
be strictly positive, as curvature even strictly greater than 6 = 0 does not 
imply compactness, let alone sphericity. (Consider the “bowls” - draw one - 

{(*.*.*)€R® I ** + »* = ^ — 1 , *>0} , 

{(x 1 , x ! , x 3 , x 4 ) e R 4 1 (x 1 ) 3 + (x 3 ) 3 + (x 3 ) 3 + (x 4 ) 1 - 1 , .*>0}, 
with the metrics induced from the Euclidean ones on R 3 and R 4 .) 
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Notice that if S is an RP 3 it will not embed in R 4 , so its curvature must 
be “around” several dimensions if “around” any (cf. X.1.08). Furthermore, 
if the Copernican principle is true then it implies that the “measurement” 
metric tensor given by anything remotely like general relativity must have 
the properties that lead to S being S 3 or a closely related space, which does 
not admit a locally flat metric. So it is very unlikely that any theory of 
gravitation in a flat spacetime is compatible with the Copernican principle, 
even if the flatness is not supposed physically detectable. 

The favorite topology for spacelike sections among cosmologists is that 
of S 3 (the simplest of the above spaces, and the “universal cover” of them 
all, as SL(2;R) is of SL(2;R); cf. IX.6.07). Another common choice is to 
deny the Copernican principle and suppose that the universe consists of a 
finite amount of matter in the midst of infinite darkness: that there is not 
enough matter to “close up” space by the curvature it causes, and that on a 
large enough scale spacetime approximates Minkowski space arbitrarily well. 
More exactly, that the geometry of M\K , where M is a spacetime and K 
is a region (including most of the matter) that has a compact intersection 
with any spacelike hypersurface, approximates the geometry of a piece of 
Minkowski space arbitrarily well for K large enough. Such a spacetime is 
called asymptotically flat , and is nice for coordinate calculations (needing 
only one chart), though it feels somehow rather lonely. Many results have 
been proved for asymptotically flat spacetimes, see [Hawking and Ellis]. 

2.04. The Shape of Spacetime. The Copernican principle, plus mild and 
reasonable physical conditions on the matter tensor, implies that there are 
regions in both our past and our future (joined to us by backward and by 
forward curves from here and now) where T and R grow without bound. A 
manifold cannot have infinite curvature and still be a manifold, so time must 
have a stop for some observers (such as those falling into black holes) or for 
all (final collapse of the universe). Likewise the past contains singularities; 
probably a Big Bang, (cf. IX.3.03). 

For more precise statements of these facts the reader is referred to [Hawk¬ 
ing and Ellis], which is devoted almost entirely to their discussion and proof. 


Exercises XII. 2 

1. a) Show that the symmetries on p. imply also that <= 

J + :SxR—►SxR: (x,r) t-+ (ar,r+ t) 

preserves geodesics etc. for any t. Use the symmetry 2<o and the flow 
Q : S x R x R —► S x R : (s, r, t) >-*• (s, r + 1) to establish the nature 
of parallel transport around S x {< 0 } (p. ) and deduce that of the <= 

connection coefficients and Ricci curvatures. 
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2. Show that for M with any metric tensor and / : M — ► R, we always 
have div(/I) = df . (Two lines in coordinates.) 

3. In 2.03 assume that the sectional curvature for any plane tangent to S 
is the same number, k say. Further assume that the sectional curvature 
for any plane in T X M containing the “ 5 rest velocity” timelike vector 
orthogonal to S is it'. Hence show that k = 4 n(p— ^). (Remember 
the minus signs in G and Gj.) 

3. The Stars in Their Courses 

How is spacetime shaped in vacuum around a concentrated body of matter, 
such as the earth or the sun? Since the Einstein and Ricci tensors vanish there 
by Einstein’s equation, this means: what must the Weyl tensor (which is then 
all of R) look like in such a region? The answer is obviously not unique - for 
instance the curvature around the earth is affected by the distant presence 
of the sun - unless we set boundary conditions giving the effect of other 
bodies outside our region of solution, and any “background field”. This non¬ 
local determination of the Weyl tensor by matter, the governing equation 
being divC = J where J is a function of T (Exercise 1), is analogous to 
the non-local determination of the electromagnetic field by moving charges, 
governed by Maxwell’s equations. However since the Weyl tensor is coupled 
to the shape of the underlying spacetime, gravitational effects “add” in a 
much more complicated way than electromagnetic ones. As a solvable first 
approximation, then, assume that the sun is alone in an asymptotically flat 
spacetime, with only “test particles of neglible mass” moving around to study 
the geometry outside it. 

3.01. The Schwarzschild Solution. We look for a static, spherically sym¬ 
metric solution, around an unrotating spherical star of radius ro alone in 
space. That is, we take spherical coordinates (r,0,^) on R 3 (Fig. 3.1) 
and corresponding “hypercylindrical” coordinates (f,r, 0, <j>) on R 4 . Then 



Fig. 3.1 
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seek an asymptotically flat Lorentz metric on R 4 , dependent only on r. 
(Notice that these coordinates are not everywhere defined. Nor are they, 
strictly, given by a chart, since <f> takes values in S 1 . But locally they 
correspond to a chart.) We assume further that the spheres of the form 
{ (t, r, 0, <f>) 1 1 = t a , r = r a } have the usual metric for spheres of radius r a , 
given by ds 2 = —r 2 ((d0) 2 + sin 2 9(d<j>) 2 ). (There is no loss of generality in 
this assumption, since given the spherical symmetry they must have constant 
curvature and hence a scalar multiple of the usual metric for spheres, as long 
as their circumferences always increase for increases in r a : we could always 
reparametrise r. The negative sign reflects the fact that changes purely in 9 
or <t> are in spacelike directions.) Symmetry means that there can be no off- 
diagonal spacelike/spacelike terms in the matrix for G in these coordinates. 
The static conditions tells us that for any to we have a symmetry 


(t,r,0,V>) (to -*,r,0,V>) 


which similarly implies that dt is orthogonal to d r) d $, d<j> (cf. VII.§4 on 
“name” indices.) Thus there are no off-diagonal terms at all, and we are 
seeking a metric of the form 

(ds) 2 = f(r)(dt) 2 — h(r)(dr) — r 2 (d0) 2 — r 2 sin 2 9(d<f>) 2 at (t , r, 9 , ^), 

where / and h are functions R —* R. 

We look first for a solution outside the region {(t,r,0, <f>) | r < ro } 
supposed to contain the matter, with /(r) and h(r) positive. The “asymp¬ 
totically flat” requirement implies 

lim f(r) = 1 = lim h(r) , 

r-+ oo r—► oo 

since the usual Minkowski metric is in these coordinates 


(ds) 2 = (dt) 2 — (dr) 2 — r 2 (d9) 2 — r 2 sin 2 9(d<f>) 2 . 


For technical convenience, we work with the natural logarithms of / 
and h, setting f(r) = e A ( r ), h(r) = e^ r ). Computation of the Ricci tensor 
(Exercise 2) gives as its non-identically-zero components 


Rl) 

R2) 

R3) 


Rtt 


IdA 

2 dr 2 * dr dr ' r dr 



„ _l£* L-i-lf >1 i" 


dr dr 


r dr 


Re$ 




Oix*. 7^ai4e##ia£liia 



386 


XII. General Relativity 


R4) 



e~^ sin 2 6 . 


Setting these equal to zero, since the Ricci tensor must vanish by X.6.13 
if the Einstein tensor vanishes, we get 


dX 

dr 


m dt 

dr 


by combining R1 and R2, hence A(r) = -£(r) + a. (Since the domain of A 
and £ is connected, only one constant is needed: cf. Exercise VII.5.01.) As 
r —► oo, /(r), h(r) —► 1 by hypothesis, hence A(r),£(r) —► 0 so a can only be 
zero. Hence R3 gives 

1 + r —— e x = ° . 
dr 


Since 


this gives 


that is 


— = —(e x ) = e x — =f — 
dr dr' dr dr ’ 

, r df 1 

+ fdr~ / ~ ° ’ 


whose general solution is 


df _l~f 

dr r ’ 

/<n=(i-f), 


for some constant k . So the metric is given by the line element 

(ds ) 2 = ^1 — (dt) 2 — ^1 — — ^ (dr) 2 — r 2 (d0) 2 — r 2 sin 2 0(d<l>) 2 , 


outside the star. 

This is the Schwarzschild solution of Einstein’s equation, satisfying it 
in vacuum. The constant k depends on the matter in the region r < ro: if 
k is zero, the metric reduces to that of flat spacetime. Solving Einstein’s 
equation for the inside of the star, for which we refer the reader to [Misner, 
Thorne and Wheeler], leads to the value k = 2M, where M is the “total 
mass-energy of the star” in appropriate units 1 . (A bad choice of units brings 


1 Eddington once caused consternation at a learned meeting by referring to the mass of 
the sun as about “1.45 kilometres” - perfectly valid in geometrised units, cf. Exercise 6. 
But its radius ro is much bigger than its mass. 
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in a “universal gravitational constant” G, just as choosing units of length 
and time independently leads to a non-unity “universal limiting velocity” c 
in XI.2.08.) The concept of “total mass energy” is a bit more subtle than 
one might first guess - in fact only if the star is alone in asymptotically flat 
space, as here assumed, does it have an invariant meaning - but to examine 
it more closely would require the integral techniques we have agreed to defer. 
(“Total” implies that something is integrated over the star.) Therefore, we 
simply examine here the geometry of the metric 

(i ds ) 2 = ^1 - (^) 2 = (dr) 2 ~ r 2 (d0) 2 - r 2 sin 2 9(d</>) 2 

where M is some positive constant associated with the star. It turns out 
that this M coincides with the solar mass which, used in Newton’s theory, 
gives the orbits best approximating the geodesics that we study below. In 
this sense, the “mass” can be found by analysis of orbits. 

Notice that if ro < 2M our region of interest includes points where / 
and h are negative and have no logarithms, so invalidating our method, but 
a direct check shows that we still have a solution. More critically, for r = 2Af 
the metric as given is undefined, and hence not a metric at all. The fault, 
however, is not in the metric but in the coordinates: it is strictly analo¬ 
gous to the way the spherical metric ((d9) 2 + sin 2 9(d<j>) 2 ) appears indefinite 
when 9 is 0 or tt. But we are here concerned with motions around an uncol¬ 
lapsed star with radius r 0 > 2Af. (Our own sun is an example for which the 
Schwarzschild radius 2 M is about 2.95 kilometres.) We refer the reader es¬ 
pecially to [Hawking and Ellis] for a careful treatment of that large subject, 
the fascinating geometry of black holes y as situations including criticalities 
like r = 2 M in this solution are called. 

3.02. Schwarzschild Geodesics. The non-zero Christoffel symbols for the 
Levi-Civita connection of the Schwarzschild metric are, by Exercise 3, 

pt _ p* — ^ 

tr “ rt ” r(r-2M) 

?) 

r r ei = 2 M-r 

— sin 2 0(2M — r) 

pi pt p<t> p4> _ 2, 

1 r$ - 1 0r - 1 r<t> - 1 <t>r - f 

r e H = sin 9 cos 9 

r H = r it = cot<? • 
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If a curve c is given by c(a) = (c‘(cr), c r (cr), c tf (cr), c*(a)) then the con¬ 
dition IX. 1.04 that c be a geodesic thus becomes 


(0 

00 


(in) 

(iv) 


dV 


+ ■ 


2 M 


dc* dc r _ q 


dV 

dcr 2 


da 2 c r (c r — 2 M) da da 
Af(c r — 2M) (dc tx2 /J-r\2 


/ dc^ 

2Af) V da 


y 


/d£y_M_ 

(c r ) 3 \ da ) c r (c r — ‘ 

dc* dc* 


d?c 9 2 dc r dc 9 . 9 B 

-r-r + — — —-sin c cos C 

a<r^ c r cur a<r 


dV ^ d£dc^ , 

dcr 2 c r dcr da C °* C da da 


0 

= 0 . 


If for some cr 0 € R we have c tf (cr 0 ) = f, ^(cr 0 ) = 0, then for all cr, 
c*(«r) = (why?). So we can without loss of generality suppose this, since 

we can always choose coordinates so as to make c 9 (ao) = § and c*(<ro) or¬ 
thogonal to df. The equations reduce, for orbits thus “lying in the hyperplane 
0 = •§” (though strictly M has no hyperplane, not being affine) to 


Gl) 

G2) 


G3) 


(Pc* 2M dc * dc r _ 

da 2 "** c r (c r — 2Af) da da ~ 
dV M(c r — 2Af) / dc* \ 2 M /dc r \ 2 

dcr 2 (c r ) 3 \ dcr / c r (c r — 2M) \da J 

-(,-uo (£)’-» 

dV 2 dc r dc* n 

do -2 c r da da 


But 


d_ 

da 



2 dc r dc*\ 
c T da da) ’ 


hence G 3 integrates by inspection to 


G3’) 


(c r ) 2 


dd*_ 

da 


= constant = A , 


say 


(“Conservation of angular momentum”). Similarly Gl integrates immedi¬ 
ately to 
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Gl’) 



dc t 

— = constant = 
dcr 


B 


(cf. Exercise 4.) 


The constants A and B are fixed if we know “initial conditions” c(<tq) 
and c*(<t 0 ) for some (Tq € R: evidently they may take any real values for an 
arbitrary geodesic, though B = 0, for instance, implies that the geodesic is 
spacelike. 

3.03. Radial Motion. Equation G 3’ shows that if ^“( 00 ) = 0 for some <7o, 
is constant. Hence geodesics in the surface 

s={(*,r,M) \e = l> <t> = <t> 0 } 

with coordinates ( t , r) and a metric given by 

coincide with geodesics in R 4 with the Schwarzschild metric. 

We shall discuss this analytically in a moment. Notice first, however, 
that we have already encountered a 2-manifold with a metric of the form 

(ds) 2 = f(r)(dx) 2 + h(r)(dr) 2 

with /(r) —► 0 and h(r) —► 00 as r descends to some q £ R: namely, the 
indefinite example of IX.5.03, with x = 0, q = 0, and 

/(r) = r 2 , h(r) = 1 + lj . 

We have seen how the geometry of this “pushes geodesics inwards” so that 
they can rise from low r, reach a maximum and fall back. Only r and t are 
varying, so this models something “thrown straight up and falling straight 
back”. Thus we have already a qualitative example of how spacelike curvature 
can guide timelike geodesics “downwards”. That example is not asymptoti¬ 
cally flat, however - indeed a radial curve, with t constant, has finite length - 
so this surface is rather different further out. 

In R 3 with cylindral coordinates (7Z,0,z) (to reserve r for our radial 
coordinate in the Schwarzschild solution) and the same indefinite metric G 
as in X.5.02 we may take the surface 

N = { (i2,0,z) | z = f(R), 0<R< 1} 

(Fig. 3.2), where / is any indefinite integral of 
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]0,1[-R 



16 M 2 

(l-O 4 ' 


Now consider a map defined by 

♦ • M X ]W,oo[ - R 3 : (<,,) » ^1 - - M.)) , 

still using cylindrical coordinates on R 3 . 

It is clear that the image U of ^ in R 3 lies in N, that <f> is injective and 
that <j>*~ defines a chart on N with domain U and image in R 2 . N with the 
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coordinates (/, r) given by <f> has the metric induced from G given exactly by 
* above (Exercise 5). The same geometric reasoning as before then explains 
the “pull” on geodesics towards r = 2 M. N is asymptotic to a cylinder’s 
intrinsic flatness as r —► oo. 

Notice that we have been able to induce the Schwarzschild metric only 
by embedding just part of the (r,t)-plane in a flat R 3 : if we tried to embed 
a longer t-interval than 27r, we would meet the same point in R 3 more than 
once. Fig. 3.2 is a realisation of the curvature of part of the surface, and has 
nothing to do with its cause , which is an embedding in R 4 with curvature 
intrinsically determined by Einstein’s equation. (If a Philosopher of Science 
can be brought along as far as understanding Fig. 3.2, you will have cured 
him of “bent round what?” permanently. The embedding that gives this 
curvature locally will be evidently non-physical, even to him.) 

For “radial motion” geodesics, G2 reduces to 

d 2 c r M(c r -2M) M /dc r \ 2 

da 2 ^ (c r ) 3 y da J c r (c r — 2M) \ da ) ~ 


For c timelike and a proper time, we have 


l = c»-c*( 


that is, 


V A 2 M\fdc t y / 2M\~Wdc r \ 

'H 1 -—-(1-—j UJ 

/dc’\ 2 (, 2 MY'f, (, 2Af\ -1 

uj=( 1 -—j H 1 --) ujj- 


Combining these two equations, 

dV M ( 
da 2 + (c*-) 2 \ + 

which reduces to 


(t-My'fKY) _ ¥—( 

\ c r ) \d<r J J c r (c r — 2 M) \ 


dc r 


*)-■ 


d 2 c r M 

+ 7-^ = 0. 


da 2 (c r ) 2 

If at some a 0 we have ^( cr 0 ) = 0, then c*(<r) is a scalar multiple of d t , so 
that c*(<r) • c*(<t) = 1 gives 


dc t 

dcr 


(oo) 


V c r (°o)J 


hence by G1’ 


2 M 

dc * . c r (<r 0 )^ 

■> = “— m~ 
1 fM’ 


for all cr j. 
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Sometimes c r (<r) is a large enough multiple of M that we may approximate 
this by 1. (Here, at our distance from the sun, it is about one part in 600,000 
less.) Then we may approximate c* by <r, hence a by t. The result (putting 
in the “gravitational constant” G for the sake of familarity) is 

d 2 c r MG 
dt 2 ~ ( c r ) 2 

as an approximation to the geodesic equation. This of course is exactly 
Newton’s result for radial motion of a particle solely influenced by the grav¬ 
itational effect of a fixed mass centred at r = 0. 


3.04. Orbital Motion. A timelike geodesic modelling the movements we 
see in the solar system has very much larger than the other components 
of c*. Thus at any point the geodesic is nearly tangent (Fig. 3.3) to a “ra¬ 
dial motion” one with 4^- = ^ = 0, and so by continuity has a nearly 
identical apparent “acceleration towards the sun”. (Hence the applicability 
of Newtonian theory as an approximation in this case also.) Fig. 3.2 thus 
remains a better visualisation of why planets seem “pulled towards the sun” 
than any embedding in a flat space of the hypersurface 0 = that we might 
construct, since this would involve at least one more dimension than we can 
draw. 

Consider a timelike geodesic c parametrised by proper time with c e = 

= 0 identically, and ^ 0 for some and hence by G3’ all <r. Using the 
“unit length” condition again, we have 


or, 


(dc '\ 2 , / 2 MY 1 fdc r \ 2 . r , 2 / dc+\ 2 \ (, 2 M\ 

w = H 1 --) (*) +(c) Ur) H 1 --) 


-1 


Using this to eliminate c* from G2, 


d 2 c r M ( 
da 2 + (c-) 2 l 1 + 


(-”r(£)'*w(sn 


M 


c r (c r - 2 M) \ da 


da J ' v ~ ' \da J ) 




which reduces to 


dV 

da 2 


— / r M 

^2 ( c 3M ) ( da J + ( c r)2 - 0 
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Fig. 3.3. Even a spiral curve (a) at constant radius is nearly tan¬ 
gent to a purely radial motion (6) if is much greater than 


This immediately gives “circular” orbits (helices in spacetime) of con¬ 
stant “radius” c r = a: = 0 and G3’ becomes 

a?— - a so — - A - — 
dcr ’ d<r a 2 T 

where T is the period of the orbit, m easured in proper time. Substituting in 
*, T must be precisely 2w — 3; cf. Exercise 6c. (If A < 6Af, the orbit 
cannot be stable: the smallest perturbation inward will make it spiral down 
to r = 2M. See [Misner, Thorne and Wheeler].) 
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Substituting in * from G 3’ we have the precise form 


d 2 c r 
da 2 




for the “radial acceleration” of a timelike geodesic in Schwarzschild geometry, 
outside the Schwarzschild radius. 

Let us now “reparametrise c by <j> n : since never vanishes, c* has 
a smooth inverse at least locally by the Inverse Function theorem. Take 
: ]0, 2'7 t[—► R such that c^(V>(^)) = <t>. (c^ o ^ is often denoted simply by 
<f>, as is c^. In this context the result is to denote three not-merely-formally- 
distinct objects, the real number (coordinate label) 0, and the two maps c^ 
and c* o by the same letter, in an attempt to “simplify”.) We denote the 

corresponding reparametrisation c r o ip of the r-coordinate function c r by f . 

Substituting in * the consequence (Exercise 7d) 


of G 3, we get 


Setting u(<l>) = 



d 2 f 

d<t> 2 



- r + 3M + 


Mr 2 
A 2 


= 0 . 


this is equivalent (Exercise 7) to 


d 2 u 

dfi 


+ u — 3 Mu 2 


M 

A 2 


= 0 . 


By Exercise 8a this has an approximate solution in polar coordinates as a 
conic section with focus at r = 0, given by 


= u(<f>) = -|(1 + ecos(<f> - ^o)) • 

Here e is the eccentricity (0, 0 < e < 1, 1, e > 1 for circle, ellipse, 
parabola respectively) and <po is the value of <j> at closest approach to the 
origin (the perihelion - Greek for “near the sun”). 

An iterative procedure like the one used in the Appendix to converge on 
the solution of a differential equation yields the next approximation (Exer¬ 
cise 8b) 

«W = ^ (i + ^jr) (i + ecos(<^ - p(<i>))), 

where 

_ I , 3 M*<t> 

p{.9) — <pq h—■ p— ; 
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for e < 1 this is “an ellipse whose perihelion is slowly rotating”. This approx¬ 
imation is good enough for the purpose of solar system astronomy. It predicts 
precession rates of 43, 8 and 4 seconds of arc per century for geodesics mod¬ 
elling the orbits of Mercury, Venus and Earth respectively for instance, in 
good agreement with observation. For parabolic and hyperbolic orbits the 
difference from strict conic sections is too small to detect. 

The analysis of null geodesics can be carried out on similar lines: light 
grazing the sun is “bent towards it” with an apparent deflection of 1.75 sec¬ 
onds of arc. Apart from its value as a test for general relativity, the effect 
of our local gravitation on light is thus not very significant or easy to de¬ 
tect; the geometry of optics in a vacuum only becomes dramatically different 
from the flat case in the vicinity of a black hole, or a body extending not far 
outside its Schwarzschild radius and so in danger of falling completely in, in 
gravitational collapse. 

3.05. Spacelike Geodesics. Consider the surface S = { (r,t,<£,0) | t = to , 
r > 2Af, 0 = }. By G T geodesics anywhere tangent to S remain in it, so 

G2 reduces, much as in 3.03, to 


d?c r 
ds 2 


M 

c r (c r - 2 M) 



- (c r - 2 M) 



= 0 . 


(We denote the arc length parameter along spacelike geodesics by s, since a 
is proper time.) Clearly is always positive, unlike timelike radial motion 
where it is always negative or orbital motion where it is positive at perihelion, 
negative at aphelion (if any). So these geodesics have no aphelion (furthest 
point from the sun). As one might expect, something “infinitely fast” is above 
escape velocity. By Exercise 9 we can realise the negative definite metric 

-(ds) 2 = (l - (dr) 2 + r 2 (d<t>) 2 

of S by embedding it in R 3 with minus the Euclidean metric and cylindrical 
coordinates (r, <^,z) as 


N= {(r,0,z) | z =-y/SM(r-2M), r > 2M } 

(Fig. 3.4a) and using the r and ^ of R 3 as coordinates on it. (For an uncol¬ 
lapsed star, with ro greater than the Schwarzschild radius, we have of course 
a positively curved “cap” in the region r < ro; Fig 3.4b). 

The situation is clearly qualitatively similar to the Riemannian example 
in IX.5.03, for r greater than ro and 2Af, and the shapes of these “instanta¬ 
neous travel” orbits may be found experimentally by stretching strings on a 
model. Again, they are clearly “deflected towards the sun”. 
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(a) 




(b) 



Fig. 3.4 


As no experimental test for the spacelike solutions of the Schwarzschild 
geodesics is in prospect, we leave the interested reader to investigate them. 
Are they sometimes (when?) approximately conics? 


Exercises XII. 3 

1. a) If J(ti,v,tn) = v(R(u,tt>)) -w>(R(u,u)) +^(w(R)u-v — v(R)u-w) 

(where, for example, w(R) means dR(w); cf. VII.4.02), show that 
J is a (g)-tensor field and the 2nd Bianchi identity is equivalent to 
div C = J. 

b) Use Einstein’s equation to give J in terms of T. 

2. Compute the connection coefficients, Riemann tensor and hence Ricci 
tensor of the metric 

(ds) = e x (dt) 2 — e^(dr) 2 — r 2 (d9) 2 — r 2 sin 2 0{d<j>) 2 . 

(You will need the fact that if g(t>r y 0,<£) = g(r), then d r g = ^.) 
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3. a) Inserting functions A(r) = log(l-^), f(r) = -A(r) in Exercise 2, or 

otherwise, find the connection coefficients of the Schwarzschild metric, 
b) Substitute the result into the general geodesic equation to get (i) - (iv) 
of 3.02. 

4. Use equation G 1’, and the Schild argument of XI.5 in reverse, to com¬ 
pute the ratio of the measured emission and reception frequencies for a 
photon going from (t 1? r 1} 9i , ^i) to (t 2l r 2 , 0 2 , fa) in the Schwarzschild 
solution, with n, r 2 > r*o > 2M. (Note that for an observer in a “fixed 
position” 

da da da 

while c*(a ) • c*(<r) = 1 by definition of proper - measured - time.) 

5. Compute the metric induced on the surface N of 3.03 by the metric 
( ds ) 2 = (dR) 2 + R 2 (d0) 2 — ( dz ) 2 , using the chart given. 

6. a) In the case of radial motion, is there a difference between general 

relativistic and Newtonian values for escape speed at a point x = 
(t,r,0,^)? (The escape speed is the least speed, relative to (d*)* as 
“rest”, such that > 0 for all succeeding a) 

b) Assume that the speed of light is 3.0 x 10 8 metres per second and also 
that one year is 10 7 tt seconds (both are true nearly to 3 significant 
figures). Suppose that a moon, earth and sun are in circular orbits 
of radii 3.9 x 10 s , 1.5 x 10 11 and 3.0 x 10 20 metres around the earth, 
sun and galaxy whose masses (in geometrised units) are 4.4 x 10" 3 , 
1.5 x 10 3 and 2.2 x 10 14 metres respectively. 

Show that their periods are approximately -Aj year, 1 year and 
10 8 years, respectively. 

c) For non-radial motion, does escape speed depend on initial direction 
relative to ( d t ) x as “rest”? (In Newtonian theory, it depends only on 
the orbit not hitting the sun. Is this still true? Compare circular 
orbits with radial motion.) 

7. a) Use the chain rule to state G 3 in terms of the identity map c^ox/) and 

the reparametrisation r, and deduce the consequence used in 3.04. 
b) Prove the equivalence of the differential equations stated in 3.04 for r 
and u = 4. 

8. a) If if is a function R —► R, define a new function 

d <“> = 3 ^ + “- 3 "“ -^5 

and show that the function u given by the conic section equation 
satisfies \D(u)\ < k for some explicitly given real number fc. When is 
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it reasonable to treat k as zero (that is, use the conic as if it were an 
exact solution)? 

b) Repeat (a) for the “precessing ellipse” equation. 

c) What is the difference between “u is approximately a solution” as in 
(a) and (b), and “u approximates an exact solution u”, in the sense 
that there is a small 6 with | u(<f>) — u(</>)\ < 6 for all <j>l Are the 
two equivalent (i) for geodesics with domain R, (ii) for geodesics with 
domain a compact interval? 

9. Compute the metric induced on the surface N of 3.05 by minus the 
Euclidean metric on R 3 . 

4. Farewell Particle 

We can now see how the result of 1.03 is a little fictitious. 

Any physically significant particle must have 4-momentum. We cannot 
have, say, a particle with only charge: the charge can produce changes in the 
4-momentum of a charged particle with rest mass, so if it persists in having 
zero 4-momentum it violates conservation. (If not we can consider it as 
coming into existence, like, say, an emitted photon.) So either conservation 
is false, or a zero-4-momentum particle can interact with nothing we can 
interact with, and is thus physically meaningless as far as we are concerned. 

Now a Newtonian particle of mass m, as the limit of little balls of radius 
r, density as r —► 0, makes a moderate amount of sense. The strength 
of the gravitational field tends to infinity as we approach the particle, which 
is odd, and the energy to be obtained by letting two particles fall towards 
each other is infinite, which is odder. However, these oddities can be dodged. 
For relativistic particles there are more fundamental problems. 

The particle has its own effect on spacetime. This means first the metric 
of an asymptotically flat spacetime containing one sun and one particle is 
not exactly the Schwarzschild metric anywhere, any more than the Newto¬ 
nian “central field of force” exactly allows for a space probe’s effect on the 
sun. More importantly, the singularity of gravitation involves a singularity in 
spacetime itself: an infinity in the curvature is inconsistent with any pseudo- 
Riemannian structure. (The “singularity” at the Schwarzschild radius is an 
artifact of the chart, as remarked above, but the singularity at r = 0 is not.) 
We can hardly say that the curve followed is a geodesic, if the structure by 
which we define “geodesic” breaks down at every point the particle visits. 

We cannot avoid this problem, as we could the milder Newtonian anar 
logue, by treating the particle as of arbitrarily small non-zero diameter and 
correspondingly high density. Once a body of matter, of any mass m, lies 
inside its Schwarzschild radius 2m it undergoes gravitational collapse (see 
[Hawking and Ellis] or [Misner, Thorne and Wheeler] for physics inside a 
black hole) and the singularity becomes physical, not a limiting fiction. 
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Nor can we say that the centre of mass of a larger body follows a geodesic, 
because “centre of mass” cannot be defined relativistically. Moreover there is 
no such thing as a rigid body, that is one such that a push at one side starts 
the whole body moving at once. (What does “at once” mean all over the 
body?) If it is “rigid in the frame of reference F”, even in flat spacetime, this 
means that the F speed of sound in it is infinite: and there are many frames 
in which the push travels through the body backwards in time. Thus while 
the assumption that planets are rigid spheres allows Newtonian mechanics 
to treat their orbits as those of points, there is no mathematically or phys¬ 
ically practical way of ignoring their “internal vibrations” without ignoring 
relativity. (For a coherent treatment of the relativistic dynamics of classical 
matter without the simplifications/approximations usual in cosmology texts, 
see [Dixon].) 

The utility of geodesics, then, lies in the following rather complicated 
fact, which we state without proof. Suppose in some spacetime M we have a 
body of matter, or black hole, P, with mass and diameter (suitably approx¬ 
imately defined) small in comparison to its separation from the other parts 
of M with T / 0. Let U be a tubular region surrounding the track of P. 
Then we can approximate Af\{7 by where M f is a spacetime similar 

to M except that P is absent, and U* surrounds a geodesic. This, precisely 
formulated, is the more careful statement of the “particles follow geodesics” 
of 1.03, and the “planets follow geodesics” of §3. Strictly speaking, general 
relativity does not admit the point particles of classical mechanics. 
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irai/ra get, 

Heraclitus of Ephesus 


1. Completeness 

Axiom VI.4.01 was essentially one-dimensional. For general use we need more 
apparatus: 

1.01. Definition. A Cauchy sequence in a metric space (X y d) (defined in 
VI.1.02) is a sequence S : N —► X : i i-> x,- in X such that for any c > 0 there 
is an M e N (generally larger for smaller e) with the property that 

m,n > M => d(x my x n ) < e , (Exercise 1). 

If some subsequence S' of a Cauchy sequence S converges, say to x £ X y 
so does 5: 

Any neighbourhood U of x contains an open ball B{x y e) y by Defini¬ 
tion VI.1.07 of the metric topology. Since S' converges and S is Cauchy, 
there are L y M £ N such that d(x,x # ) < ^ for i > L y x,* a point of S' y 
and d(x,*,x n ) < | for i y n > M. So for n > M the triangle inequality gives 
d(x y x n ) < d(x y Xi)+d(xi y x n ) < e y for x # * any point of S' with i > max{L, M}; 
hence n > M x n £U. Since U was arbitrary, S thus converges to x. 

Combining Exercise 1 and VI.4.04, a sequence in a compact interval [a, b] 
with the usual metric converges if and only if it is Cauchy. But any Cauchy 
sequence in R lies in a compact interval ([min(A') — l,max(A') + 1] where 
Me N has m y n > M => \x m - x n \ < % y K = {xi,x 2 So any 

Cauchy sequence in R converges. Conversely (Exercise 2) this fact implies 
the Intermediate Value Theorem. Thus VI.4.01 is a specialisation of 

1.02. Definition. A metric space is complete if all Cauchy sequences in it 
converge (cf. Exercise 3). 


Exercises A.l 

1. a) If i i—► Xi in a metric space ( X y d) converges to x, show that for any e > 
0 there is M e N such that m,n > M => d(x m ,x) < d(x,x n ) < 
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b) Deduce that a sequence in a metric space is necessarily Cauchy if it 
converges with respect to the metric topology. 

2. a) Suppose / : R —► R is continuous, /(R) = {— 1 ,+ 1 }, and f(a) = 

— 1 , /(&) = + 1 . Construct a Cauchy sequence S : i x,- such that 
i h—► /(x t ) does not converge. (Hint: set x,- = a, X 2 = ^(a + 6), 
X 3 = ^(a + 36), ... until the first i with /(x # ) = + 1 ; by continuity at 
6 , this must happen for i finite. If /(x„) = —1 get x n +i by moving 
towards the most recent x # - with /(x,*) = + 1 , and vice versa. 
Prove S Cauchy and f o S divergent.) 

b) Deduce by VI.2.02 that S does not converge. 

c) Deduce that if all Cauchy sequences in R do converge, the Intermediate 
Value Theorem is true for all real numbers. 

3. a) Deduce from Exercise VI.3.8c that if X is a finite-dimensional real 

vector space and S : N —► X is Cauchy in one of the metrics of 
Exercise VI.3.5, for some basis, it is Cauchy in them all, for any basis. 
(N.B. There exist metrisations of the usual topology for which v, 2v, 
3v, ... is Cauchy.) 

b) Deduce that if R is complete (which we shall continue to assume), so 
is X in the metric given by any norm (cf. Exercise VI.3.8). 


2. Two Fixed Point Theorems 

n times 

- . *N 

Throughout this section f n will mean / o / o • • • o /, not a component func¬ 
tion, and /°(x) will mean x: similarly for F n . 

2 . 01 . Definition, p £ X is a fixed point of / : X —► X if f(p) = p. For X 
a HausdorfF space, p is an attracting fixed point (Exercise 1) if for arbitrary 
x G X, limn—oo f n (z) exists and is p. 

2 . 02 . Definition. For ( X , d) a metric space, A E ]0,1[, a map / : X —► X is 
a A -contraction if cf(/(x), f(y)) < Ad(x,y) for all x,y E X. 

2.03. Shrinking Lemma. If (X, d) is complete and f : X —► X is a A- 
contraction, f has an attracting fixed point. 

Proof. For any xEl, S x : i / _ 1 (x) = x,* is Cauchy: 

d(x n ,x n _j_i) = <i(/(x n _i), /(x n )) K. Ad(x n _i,x n ) = • • 

< A n ” 1 d(xi,X 2 ) = A n ~ x k , say. 

If m > n, repeated use of the triangle inequality gives 

d(x n , x m ) < d(x n , x n _|_i) + d(x n + 1 , x n + 2 ) 4* * * * + d(x m «i, x m ) 
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< (A"" 1 + A" + • • • A m ~ 2 )k < , (Exercise 2), 

1 A 

But i ^ A -1 converges to 0 by Exercise VI.4.8, so for any e > 0 there is 
M £ N with 

n > M =>■ A n_1 < -— — ^ => d(x n ,x m ) < e for m > n. 

Similarly, m> M => d(x n ,x m ) < e for n > m. Combining these facts, 
m, n ^ M ^ d^Xfij x m ) ^ s . 

Thus since X is complete, S x converges to some p £ X . For y £ X, S y 
similarly converges to q £ X. By Exercise lb both p and q are fixed, thus 
d{p, q) = d(f(p), /(?)) < A d(p, q), so d(p, q) = 0 so p = q by Axiom VI. 1.02 ii. 
Hence p is an attracting fixed point for /. □ 

We need also a similar but more intricate result, first proved in [Hirsch 
and Pugh]: 

2.04. Fibre Contraction Theorem. Let X be a Hausdorff space, (V, d) 
a complete metric space, and F:XxY->XxY a fibre map over the 
projection : X x Y -► Y : (x,y) x. That means *iF(x,y) = ir x F(x,y') 
for all x £ X, y, y' £ Y (Fig. 2.1). Equivalently, we can write F in the form: 

F(x,y) = (/(*),/.(»)) 

where f : X X and each f x :Y—>Y. 



p fx x 


Fig. A.2.1 
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Suppose that A E ]0,1[ and 

(a) For each y £Y, the map X —> Y ‘ x f x (y) is continuous. (True if, for 
example, F is continuous.) 

(b) / has an attracting fixed point p E X. (TVue by 2.03 if X is a complete 
metric space, and / a A-contraction.) 

(c) Each f x is a A -contraction of(Y,d). 

Then if q £Y is the attracting fixed point of f p given by 2.03, the point 
(p, g) G X xY is an attracting fixed point of F. 

Proof Choose x E X, set X{ = / , ”' 1 (x), 6{ = d(q,f Xi (q)). Then lim n _oo = 
0, for 

lim /*„(?) = lim n 2 (F(x n ,q)) = ir 2 (F( lim (*„,«))) by VI.2.02 and (a) 

n—►oo n—+ oo n—►oo 

= »a(^(p, «)) 

= / P (<?) = 9 • 

For any y G Y, ir 2 (F n (x,y)) = /,, oo • • o f Xl (y), so using the triangle 
inequality: 

d(w 2 (F n (x,q)),q) < d(f Xn o ••• o f Xl (q),f Xn (q)) + d(f x ,{q),q) 

< M(f Xn _ l o---of ICl (q),q)+6 n 

< a [Ad(/ rn _ 3 O • • • O f Xl (q),q) + £„_i] + 6 n 

< • • • < A" 1 d(f Xl (q),q) + A n-2 £ 2 +-h A^,,.! + 

n 

= ^A n ~*6 t - = E n for short. 

«=i 

But limn^oo E n = 0; setting k = ^n, if n is even, ^(n — 1) otherwise, and 
M* = sup{ 6j | j > k } (which exists for any jb, since 6j —► 0) we have 

k n 

s n = J2x n -H i + Y, 

«=1 «=fc+l 

< (A"" 1 + ■ • • + A n -*)M„ + (A""*" 1 H-1- A + l)Mt 

. A"-‘M 0 , M fc , „ . „ 

< _ - —h _ by Exercise 2. 

As n —► oo, so do k and (n — k ). lim^-i^oo A n-fc = 0 by Exercise VI.4.8 and 
limt_oo M* = 0 by the convergence to zero of 6j, so we have lim n _ 00 S„ = 0 
also. Hence since 

4r 2 (F"*(*,y)), g ) < y)),«))) + ,)),g) 

< y i d(y,q) + U n , 
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lim d(v2(F n (x,y)), q) = 0, so by Exercise VI.3.7b, d( lim tt 2 (F n (x,y)),q) = 

n—+ oo n—►oo 

0. Thus ir 2 ( hm F n (x,y)) = q, and since tti( lim F n (x,y)) = p we get 

'n-knn 7 n-4fc 7 


lim F n (x,y) = (p,q). 

n—► oo 


□ 


Exercises A.2 

1. a) Give examples of continuous maps S 1 —* S' 1 with no fixed points, and 

with several. 

b) If X is Hausdorff and i i-» f %mml (x) converges to p for some x £ X, 
show that /(p) = p, for / continuous. 

c) If, further, all such sequences converge to p, show that p is the only 
fixed point of / (regardless of whether / is continuous). 

2 . a) Show that for any A € R, (1 — A)(l + A H-h A n ) = (1 - A n+1 ). 

b) Deduce that if 0 < A < 1, m > n, then (A n " 1 +A n H-hA m “ 2 ) < 


3. Sequences of Functions 

3.01. Definition. If X , Y are topological spaces and /,* are maps X —► Y, 
i € N, a function f : X -+ Y is their (unique if y is Hausdorff) pointwise 
limit if for every x £ X, lim n _>oo / n (z) exists and equals /(x). 

Unfortunately, / may be less nice than the /„. Thus, with X = Y = 
R, if / n (x) = (ux 2 + l )" 1 each f n is C°° but their pointwise limit is the 
discontinuous function “f(x) = 0 if x ^ 0, /(0) = 1” (cf. Fig. A.3.1). 

The trouble is that i i-» /,-(x) converges more and more slowly as x 
approaches 0: for any n we can find x ^ 0 with / n (x) still arbitrarily close 
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to 1. If Y is a metric space, we can define a stronger kind of convergence 
that behaves better. 

3.02. Definition. If X is any set, ( Y,d ) a metric space, and i »-► /,• a se¬ 
quence of functions X —► Y, then / : X —► Y is its uniform limit , and they 
converge uniformly to /, if for any £ > 0 there is an M £ N such that 

n > M, x € X => d(f(x), /„(x)) < £ . 

We write / = lim^oo f n . 

A function g : X —* Y is bounded if for some (and hence - why? - any) 
y E Y the set { d(y,g(x)) | x E X } C R is bounded (VI.4.05). The uniform 
metric on the set of bounded functions X —>Y is defined by 


du(f,g) = sup{ d(f(x),g(x)) | x £ X } . 

(This sup exists by the boundedness of / and g y the triangle inequality, and 
Exercise VI.4.6.) Evidently convergence of i /,• in the uniform metric is 
equivalent to uniform convergence (just combine the definitions). 

3.03. Lemma. Let X be a topological space f Y a metric space . If i /,• 
converges uniformly to f : X —» Y and the /< are continuous, so is f. 

Proof For xo € X, e > 0, choose M 6 N, such that n > M, x £ X => 
d(f(x),f n (x)) < |, a number m> M, and by continuity of f m a neighbour¬ 
hood U of xq such that 


•E € U d(/ m (xo), /m(^)) < 2 


Then for x E U, 


d(f(x 0 ),f(x)) < d(f(x 0 ),fm(x 0 )) + d(f m (xo), fm(x)) + d(f m (x), f(xj) 

£ £ £ 

< 3 + 3 + 3~ £ ' 

Hence / is continuous at xo- □ 

3.04. Lemma. Let X be a topological space, Y a complete metric space. 
Then the space F of bounded continuous maps X —► Y, with the uniform 
metric, is complete. 

Proof. Let i »--► /,• be a Cauchy sequence in F. Then clearly for any x E X, 
i fi(x) is a Cauchy sequence in Y, we may define /(x) = lim n _>oo f n (x). 
For £ > 0 we have M with 


m,n > M =>• du(fm,fn) < - 

=> d(f m (x),f n (x)) < | 


VxGX 
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=>■ d(f(x),f n (x)) < r . Var € X by Exercise la 
du(f)fn) ^ 2 ^ du(f,fn) < £ • 

Hence i »-+ /,• converges uniformly to /, which is continuous by 3.03, bounded 
by Exercise lb, and hence in F . □ 

3.05. Corollary. If B(y, 6) = {y ; | d(y, y') <{} CY, and Y is complete, 
then so is the space F 9 of continuous functions X —► B(y,S), with the uniform 
metric . 

Proo/. Evidently C F, so a Cauchy sequence i »-► /,• in F f has a uniform 
limit f e F. Moreover x £ X => /(x) = lim n _>oo/n(z) G P(y,£), since 
B(y,6) is closed. Hence f £ F f . □ 

If X, Y have differential structures, so that the f n can be differentiated, 
what we can say about the sequence i Dft? Even if lim n _oo f n and 
all the fi are C°°, we may not get lim n — ooDfn = Df • If, for example 
(cf. Fig. A.3.2) /„ : R —► R : x »-* £sin(nx + n 2 ), then i »-► /,• converges 
uniformly to O : R —► R but for no x does i converge (why?). Likewise, 
the uniform limit of polynomials may be nowhere differentiable. However: 


f. 









Fig. A.3.2 


3.06. Lemma. Let X , Y be affine spaces with vector spaces S, T, U C X be 
open, and L(S;T) have a metric d given by a norm . If i i-* (/,* : U —► Y) is 
a sequence of C 1 functions converging pointwise to f in the usual topology, 
and i (Dfi : U —► L(S;T)) converges uniformly to F :U —► L(S;T), then 
f is also C 1 and D x f = dy^ oFod x . (Recall D notation, VII.1.02). 

Proof By Exercise 2 it suffices to choose affine charts U —► R m , Y —► R n and 
correspondingly L(S;T) -+ R mn (giving coordinate functions /£, f n , dip , 
F- : U R for /, £>/, F) and the metric given in coordinates as 

<*([«<]>$]) =max{|aj -frj| | i= j = . 
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In this metric, uniform convergence of i »-► Df{ to F means 

* For any e > 0, 3 M such that n > M =>> | difl(x) - Ff (x)| < e, Vi, j, x. 

Since the f n are C 1 , the Df n are continuous, hence by 3.03 so are 
F and the Ff . Hence (by Exercise VII.7c) existence of D x f follows if we 
prove dif*(x) exists and equals Ff(x), each i, j : likewise continuity. Fix 
x = (x 1 ,...,*"). 

For e > 0, apply * to | to get M G N such that 

n > M =>• | dif’Jjc) - Fj(x)\ < | , Vi, i, x 

^i(s)-F/(s) < | , Vi, j, s 

wherever f } ni (s) = /'(x 1 ,....x*' ,x m ), Ff = Fj(x 1 ,... ,x'+ s,... ,x m ) 

are defined. 

By continuity of Ff , there exists 6 > 0 such that 
\s\<6*\F!(s)-F!(0)\< £ -. 

Combining this with ** by the triangle inequality, we get 

df j - 

*** n > M, |*| < 6 => -**(*) - Ff (0) 

at 

Now, suppose r E ]— 6, 6 [, n > M, and 




&(r)-/',-(0)-rF/(0) fe . 

r ~ 3 ’ 

applying the Mean Value Theorem to the function t »-► [/£,-(*) — /^,(0) — 
JF/(0)], which is clearly C 1 , this implies there exists s between 0 and r, 
hence with \s\ < 6, for which *** fails: contradiction. Therefore 
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< 6 => 


/''(a: 1 ,... + s ,... ,x m ) - fi{x) 


-F!(x) 


^ & 

^J <£ 


Taking the limit as 6 —► 0, we see that dif*{x) exists and equals Fj(x). □ 


Exercises A.3 

1. a) Deduce from Exercise VII.3.7 that if d(x my x n ) < e for all m > Af, 

d(lim m _* 00 %m j ^n) ^ £• 

b) Shown by the triangle inequality that if du(f m ,f ) < £ and f m is 
bounded, then so is /. 

2. a) Show that if || || 5, || \\t are norms on finite-dimensional vector spaces 

5, T, then ||A|| = sup{ ||As||t | \\&\\s < 1} defines a norm on L(S]T). 
b) Deduce from Exercise 1.3 that if £>/,• : U —► L(S;T) in 3.06 converge 
uniformly for one choice of norm, they converge for any. (This is false 
in infinite dimensions.) 

4. Integrating Vector Quantities 

One final addition to the technical background (§1-3 can be found in much 
greater detail in various Analysis texts) before we prove VII.6.04: 

4.01. Definition. If A is a vector space and d : X x X —► X its natural 
affine structure (II. 1.03), an indefinite integral for a curve c : J —* X is a 
curve g : J —► X such that d c ^(g*(t)) = c(t), Vt E J. The definite integral 

of c from a E J to 6 E J is f c(s) ds = g(b) — g(a), where 0 is an indefinite 
integral for c. If X is finite-dimensional and c continuous, the existence of an 
indefinite integral follows directly from the R —► R case, since for any a € J 
and basis 61,. .., b n for X y 


t^\ c*(s)ds J b{ 


is an indefinite integral for c. The uniqueness of the definite integral follows 
similarly. 


5. The Main Proof 

We prove a set of results that add up to VII.6.04. If you have difficulty 
following the argument, it may help first to read §6, where a similar but 
simpler use is made of the Fibre Contraction Theorem. 
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5.01. Theorem. Let Z be a finite-dimensional affine space with vector space 
T, U C Z be open f and w : U —► T be C l . Then for any zo E U there 
exists e > 0, a neighbourhood N C U of z 0} and a continuous map <j> w : 
N x ]—e,e[ —* U such that 

(i) M*> 0 ) = z, V*€iV 

(ii) For any z E N, <j> z : ]-£,£[ —► U : t »-+ ^(z, J) is differentiable and 
d <hv(*,t) (&(*)) = w(</>„(z,t)), V* G ]-e,e[. 

Proof. Pick coordinates, give T, L(T ; T) the corresponding square norms (as 
in 3.06) || |Jt and || ||i, and Z the metric d(z J z') = j|d(z, z')|| T . Choose b > 0 
such that Bb = { z | d(zo,z) < 6 } C N. Bb is compact, so by w C 1 and 
VI.4.11 there exist m,l G R such that 

m = sup{ \\w(z)\\ T | z eB h } , l = sup{ \\Dw\\ L | z E TH b } • 

Choose £, 6 > 0 such that em + 6 < 6, le < 1; set A = le, Bs = { z | d(z 0 , z) < 
6 }, W = B$ x ]—£,£[. Let X be the space of continuous maps M —► B&, 
with the uniform metric d v . By 3.05, X is complete. Define / : X —► X y for 
general <j> E X> by 

t 

/(<£)(*, 0 = * + J $)) * • 

o 

(Since <^(z,s) E B\, always, ||ti>(^(z, s))|| < m, so by Exercise lb, 

t 

II J w(<j>(z,s))d 8 \\ < m|<| < em . 
o 

So by the triangle inequality, d(z,f(<l>)(z,t)) < d(zo, z) + d(z, f(</>)(z,t)) < 
em + 6 <b, hence f(</>)(z,t) G V(z,t) G W and is thus in X.) 

Notice that any <j> = f(if), if G X, satisfies (i) automatically, and that 
dz<fz( 0) = v(z) = w(^(z,0)), already. If say ^i(z,<) = z, fa = f{<f 1 ), 
<f> 3 = f(<t> 2 ), ... one would hope that i 1 -* <f>i would converge to <t> still 
satisfying (i), and (ii) for all t E ]— £,e[, not just 0. (Fig. A5.1 shows the 
images of <f>i (namely just B 6 ), <£ 2 , <t >3 and <j >4 with the images ai = {z}, a 2 , 
a 3 , a 4 of the corresponding 


which increasingly well approximates a solution curve through a typical z E 
B&.) This does happen. <f> E X is a fixed point of / if and only if (with 
N = Bs) it satisfies (i) and (ii), by an immediate check (for ii, just apply 
d t to both sides of ^ = f(<f>) written out fully) and / is a A-contraction: for 

Mex, 
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Fig. A.5.1 


du(f(<l>),fW) = sup {d(f(<l>)(z,t),f(rl>)(z,t)) \ (z,t) G W} 

t t 

= sup| || / w (4( z > 8 )) ds ~ J w ( r l>( z > s )) *I|t | CM) e w } 

0 0 
t 

= sup {>/( - w(tl>(z,s))) <fs|| T | (z,t) e w ]• 

0 

t 

<sup{| J\\w(<l>(z,s))-w(il;(z,s))\\ T ds\ (2 ,<)ew} 

0 

^ by Exercise la 

< sup {/| J d(<t>{z,s) yi’(z,s)) ds\ | (z,t) G W | by the Mean 
o 

Value Theorem on components of w> as in 3.06, and Exercise lb, 
Klesup^d^z^),^^)) (z,t) ew} = Xdu(<f>,il>) . 


Hence by the Shrinking Lemma / has an attracting fixed point <f> w € X. □ 
5.02. Corollary. If c : ]—e / ,e / [ —► U has 

(i) ’ c(0) = 2 0 , 

(ii) ’ d c(t) (c*(0) = W (c(0), Vi 

then c(t) = ^u,(M) wherever both are defined . 

Proof. The above proof holds equally with B$ replaced by {z}, e by min{£, e'}. 
Again / is a A-contraction, and f(c) = c equivalent to (i)’ and (ii)’, so the 
result follows by the Shrinking Lemma and the uniqueness of attracting fixed 
points. □ 
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5.03. Corollary. If c : J —► U has c(to) = zq, and d c (f)( c *(0) = w (<#)) 
when both sides are defined, then c(f) = <l>w(zo,t — to) when both sides are 
defined. 

Proof, c : t c(t -Mo) satisfies (i)’, (ii) > of 5.02, hence c(t) = <j>(zo,t), so 
c(t) = c(t - t Q ) = </>(z 0) t - t 0 ). □ 

5.04. Corollary. If <f> w : N x J —► U, xj) : N* x J' —*• U both satisfy (i) and 
(ii) of 5.01, then 

<£|(JVnJV')x(JnJ') =: ^l(NnJV')x(JnJ') • 

Proo/. For any z £ N fl N', </> z and xp z both satisfy (i)’, (ii)’ of 5.02. So 
ip(z,t) = = <f> w (z>i) when defined. □ 

The <t* w existing by 5.01 is continuous by 3.03; more work is needed to 
show it differentiable. We know that the “vector partial derivative” d t <f> : 
W —► T : (z,t) i—► d*Z z t ^<f>*{t) exists and is continuous, being just wo<j> w \w- 
It thus suffices by Exercise VII.7.1c to prove that if <j>t : B$ —*► U ::h 
<t> x o{z,t) then (z,t) »-* D z <j> t exists and is continuous on W. The following 
new proof of this is from [Sotomayor]. 

5.05. Theorem. The map <f> w ’ W —► U of 5.01 is C 1 . 

Proof. It suffices to show that 

dz<i>w : W -► L(T,T) : (z,t) d^ Zti ) o D z <j>t o <Tf 
exists and is continuous. 

Let X, f be as in 5.01, L{T\T) have the norm || ||^, and Y be the space 
of continuous maps W -+ L(T;T) - i.e. candidates for dz$ w - with the 
uniform metric. (Which we shall call dy, as we shall refer again to the metric 
du on X.) 

(Y,d^j) is complete by 3.04. 

We want a fibre map F : X xY —* X xY that will make any t/t*) 

converge to ()• To get this we want - certainly for <f> w of 5.01 conve¬ 
niently for general </> - 

* F(*,0**) =(/(*),«*(/(*))) . 

Evidently there are many such F* s, since * involves only a small (in many 
senses) subset of X x Y . But the simplest is given just by differentiating /; 

(/(<£)) (M) = D ^Iz + J (u>o<£ s )<fsj (z) 
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= 1 + J b(wo<j>,)(z)ds 
0 

by Exercise IX.3.5 generalised (or for each component). 

t 

= J + J (bw(<p(z,s))) o dz<t>(z,s)ds . 

0 

Then if t 

= I + J(b W «(z ,«))) o <j>'(z,s)ds , 
o 

and <f>' is bounded by k, say, ||/*(^)(z,t)IU < || 1+ \t\lk\\ < 1 + e/Jt, so /+(#) 
is also bounded, clearly continuous, and thus in Y. So we can satisfy * by 

F so defined satisfies the conditions of the Fibre 
Contraction Theorem (2.04): 

(a) Bi is compact; so by Exercise 2, Dw\g^ is uniformly continuous. So for 
<t>' bounded by k, and any a > 0 there exists \i > 0 such that for z, z' G 

d(z,z') || Dw(z) - Dw{z')\\ L < jj- , 

SO 

du{ip,6) < n => »(<!>')) 

t 

= sup{ || J[Dw(ip(z, s)) - Dw(0(z, s))] o <j>'(z,s) ds||i, (z , t ) G W } 
o 

t 

< su p{ 1^ J *)IU *| | (*, 0 G W } < a . 

o 

So ip t-* is (uniformly) continuous, for each <p' G Y. 

(b) is the proof of 5.01 

(c) For <j>', ip' G Y, 6 G X, we have 

$(/#(A/W')) 

t 

= sup{ || J (Dw(0(z,s))) o ( <p'(z,s ) - ip'(z,s)) ds\\ L ( z,t ) G W } 

0 

t 

< sup{ |/ J II <p'(z, s ) - ip'(z, s)||i ds\ (z , f)Giy} 

0 
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< k supj ||«A'(*> S ) - ^'{z,s)\\ L 


(z,s)ew} = \d^(<t>'^') ■ 


Hence F has an attracting fixed point (/>'„)■ If <j>i(z,t) = z, <^[(z,t) = I 
we have dz<f>\ = <j>[: inductively defining (<)>„,<f>' n ) = we have, 

by *, <j>' n = dz4> n for all n. So applying 3.06, dz4> exists and equals which 
is continuous. □ 

5.06. Theorem. If w is C k , so is <f> w . 

Proof. For 0 G X, 1 < n < k, set 0^(z,f) = D<f> x {z) G L n {T]T) S 
L"-i(T; L(T;T)), where it exists. Using the square norm || ||„ on L n (T] T) 
corresponding to the coordinates in 5.01, let X be as above, V 1 = Y, and for 
i> 1 let V* be the space of bounded continuous maps IF —► L ,_1 (T;L(T;T)) 
with the uniform metric given by || ||,-. Set X n = X x V 1 x ••• x F n_1 , 
Y n = V n , and define F„ : X n x Y n -* X n —► Y„ inductively by 

= (F n _. # (.-D)(^ (b) )) , 

1 < n < Jb, 

where F\ = F as defined in 5.01 and, for t 1 ,..., t n_1 G T, 

(/(* *<-D)W < " ) )(i.i)) (*‘.*"■') 

D”-‘*‘»(0(Z, S ))(1‘.l"-‘)o *<0(,,,)(«-‘+ 1 ,...,«•)* 

G L(T;T) . 

Then (Exercise 3) 

* ®z{f (*/*)) = - » 1 < n < k 

and the proofs that each is a A-contraction and each map 

^ / ,...,^( n - 1 ))(^ n ^) continuous are just as in 5.05. So 
the hypotheses of the Fibre Contraction Theorem are satisfied for ap¬ 
pealing to 5.05 for (b), and hence inductively for F ni 1 < n < k. Using 3.06, 
we see that d^w exists and is continuous. 

For existence and continuity of D k <f> w it remains to show that mixed 
partials up to order k exist and are continuous. But dzd t <l) w = 9z(w o <j>) = 
b(wo<j> t ) which exists and is continuous by hypothesis for id, 5.05 for <f> t , and 
the chain rule. Thus so does and is dtdz<f> to, by Exercise VII.7.1a. Hence we 
have dz<t> w , 9z9 t <f>w, d t dz<f>w and dfyw = d t (wo<f>) existing and continuous, 
and therefore also D^. Then similarly 
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9z9t{dz<)> tu) = dzdz( w 0 

which exists and is continuous since w is C k by hypothesis and we have just 
shown that <j> w is C 2 . So d t dz{dz<t>w) also exists and is continuous, and so 
on. Inductively, </> w is C k . □ 

5.07. Corollary. If w is smooth, so is <j> w . □ 

5.08. Conclusion. Theorem VII.6.04 is true as stated. Around x € M 
modelled on the affine space Z, take a chart 0 : V —► Z. Let U = 0(V), w : 
U —► T : z d z (D0)*~v z ; then if v is C k so is w. If <l> w : B$ x ]— e,e[ —^ Bb 
is the map of the above results, and N = then 

<!> v : N x ]-£,£[-► M : ( y,t ) »-► (0(y),t) 

is well defined and a local flow for v around x . 

The uniqueness properties follow from 5.04, 5.03. 


Exercises A.5 

1. a) Show in the square norm, for any basis, on T, that for any c : J —► T 

with 0, t € J we have 

t t 

7, = || J c(s)cfs|| < | J ||c(s)||<fc| . 

0 0 

(The | | is needed in case t < 0 makes the integral negative.) 
b) Deduce that if ||c(s)|| < m for all s £ J, I t < m|t|, hence if J = ]— e ) e[, 
It < me. 

2. If ( X } dx) and (Y, dy) are metric spaces, K C X is compact, and 

f : K Y is continuous, show that / is uniformly continu¬ 
ous. Namely, that for any e > 0 there is a 6 > 0 such that 

dx{*i* § ) < 0 => dy(f( x ),f(x')) < e for any x € K. (Hint: if not, 
show that for some e > 0 there are sequences i p,*, i qt in K 
with limn-foo (dx(p n »9n)) = 0 but <fy(/(p n ),/(«»»)) > e for all n, and 
obtain a contradiction via the convergent subsequence property.) 

3. a) Show that if / : Z —» L(A;B), g : Z —*■ 7(5; C) are differentiable, 

where Z is an affine space, A, 5, C vector spaces, then h : z i-+ 

g(z) o f(z) is also differentiable and Dg(z)(t) = Dg(z)(t) o f(z) + 
g(z)oDf(z)(t). 

b) Deduce * in the proof of 5.06. 
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6. Inverse Function Theorem 

The Fibre Contraction Theorem also gives an exceptionally clean proof (again 
due to Sotomayor) of the Inverse Function Theorem (VII. 1.04), as follows. 

Suppose P, Q are affine spaces with vector spaces 5, T, that W C P is 
open and h : W Q is C l and has D p h : T p P £ T^ p ^Q for some p £ W. 
Write p(x) = d(h(x) ) A(x)) £ T for x £ W, where A is the affine map from 
P to Q with A(p) = h(p) = q , say, and linear part A = Dh(p). So we have 
h(x) = A{x) +p(x) 1 with D p p = 0. We look for a neighbourhood V of q 
in Q and g : V —> U with g(h(x)) = x where defined, in a similar form 
g(y) = A*~(y) + q(y) for y £ V, with q:V -+S. 

Give S an arbitrary basis s \ y ..., s n , T the basis A$i,..., As n , and 5, 
T, L(5;T) and L(T;5) the corresponding square norms || || 5 , || ||t, || ||x, and 
|| H*. (Note that ||A«|| t = Ms Vs € S, \\A\\ L = \\A~\f = 1.) By the 
assumed continuity of J5h, there is an e > 0 such that if x £ B e = { x | 
||d(p, x)\\s < £ }, then \\Dp(x)\\i < Set V = A(Bs). The difference q of 
g from A*~ should be small near g, so we take as space X of candidates for 
q the continuous maps 7 : V -» 5 with || 7 (y)||s < f for y £ V, with the 
uniform metric du . By 3.05 X is complete. Now if g = A*“ + 7 , 

h(g(y)) = y ^(^(y)+ 7(y)) + ?(>!*" (y) + 7(y)) = y 

<=> ^(t(j/)) = -p^M+'Ky)) 

Thus if we set /( 7 )(y) = -A'~p(A*~(y) + 7(y)). then /( 7 ) = 7 if and only 
if h o <7 = I v . Evidently /( 7 ) is continuous, so by Exercise 1 it is in X, and 
/ is a map X —► X. Moreover / is a |-contr action: 

du( 7,7 ; ) = / =* ||7(y) “ 7 / (y)||s < / , Vy G V 

=* lK^* - (y) + 7(y).^ ,- (y) + V(y))l|s < /, Vy e v 
=*■ l|p(^*"(y) + 7(y)) -p(^*"(y)+V(y))l|5 < ^ . Vy e v 

by the Mean Value Theorem, since ||£)p(z)|| < | for x £ B t , and (A*“(y) + 
T(y)), (^*"(y) + V(y)) €B e . So 

= «up{ ||/(7)(y) - /(V)(y)l|s y € v } 

= sup| II A^ + 7 (y)) - p(A~(y) + 7'(y)))||s y e V } 

< \ • 
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Appendix. Existence and Smoothness of Flows 


Thus by the Shrinking Lemma / has an attracting fixed point q, and 
9 = Ay 4- p has hog = I v . If q is differentiable so is g, with Dg(y) = 
A'~+Dq(y), so let V be the space of candidates for Dq (namely, the bounded 
maps V -* L(T; S)) with the uniform norm. We want F: X xY -> X xY 
with the property F(-f,Dj) = (f(-f),D(f( 7 ))); since 

D (/(t)) (y) = -D (a*~ o p o (A*- + 7)) (y) 

= - A *- o \bp(A~(y) + 7 (y))J o (A + bl(y)) , 

we get it by setting 

A(V) = -AT o [Dp(A~(y) + 7 (y))] o (A*~ + 7 # (y)) , 

Now evidently (a) each 7 i-> f-,(~t'), 7 ' G V, is continuous. We have 
proved (b) that / has an attracting fixed point. Moreover (c) each / 7 is a 
^-contraction: 

du (/y (71) > /y (72)) 

= sup{ II - A 1 - o [d p (,4- + 7 )] O ( 7 ; - 7 ')(y)|| i | y G y } 

< sup{ \\Dp(A- + 7 )(»)||l||(Vi - V a )(y)|| L | y € Y } 

< i su p{ IKVi - 72 )(y)|| L I y e Y } = $du( 71,72) • 

Thus applying the Fibre Contraction Theorem, if 7 i(y) = 0 G S, 7 ((y) = 
bl[{y) = 0 € L(T;S), Vy G y, and inductively (7n,7n) = ^(7n-i,7^_i) = 
( 7 n, D~f n ),v/e have an attracting fixed point (q, q / ) with i i-> 7 ,- converging 
to q, i t-+ £> 7 , converging uniformly to q'. So q' is continuous by 3.03, and 
Dq exists and is continuous by 3.06, and g is hence C 1 . 

We must show goh = I Ut where U = g(V), as well as hog = I v . Dq(q) 
is the isomorphism A*~ (Exercise 2), so by what we have proved above there 
is a neighbourhood U' C U of p and h' :U' -*V such that goh = Iu>. Then 
h(x) = h(g(h'(x))) = (hog)(h'(x)) = h'(x) for x G U', so h' = h\u>. Taking 
V' = h(U'), we have the required neighbourhood of q and C 1 map g' = y|y< 
with g' oh = Iu>, hog' = 7y;. 

Finally we must show that if h is C k , k > 1, so is g'. But if jfc = 2, Dh : 
TP —y TQ is C 1 and Do r (Dh) : To r (TP) —+ To r (TQ) is an isomorphism 
(Exercise 3a), so Dh has a local C 1 inverse G :W —* TP, where W C TQ 
is a neighbourhood of 0 ? . Since Dg'\w is also a local inverse for Dh (so 
Dh o Dg' = D(h o g') = D(I) = I), Dg\w = G, so Dg'\w is C 1 and hence 
g' is C 2 . Inductively, g' is C k (Exercise 3c). 
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Exercises A .6 

1 . a) Show by the triangle inequality that y £ V, ||7(y)||s < ^ =£■ 

(A^(y) + 7(y)) £B C . 

b) Deduce by the Mean Value Theorem (compare proof of 3.06) that 

||p(A~(y) + 7(y))||T < | 

c) Deduce that ||A*“p(A‘~(y) + 7(y))l|s < §• 

2. a) Show for any ( 7 , 7 ') € X xY that if 7 (q) = 0 , then f(~f)(q) = 0 6 5, 

A(7')(«) = OJE L(T;S). 

b) Deduce that Dg(q) = A 4 ”. 

3. a) Show that if Do p (Dh) exists, so does D v (Dh) for every v E T p P. 

b) Show that if h is C 2 and Do p h is an isomorphism then so is D v (Dh) 
an isomorphism for any v E T p P\ deduce that so is D v (Dh) for any 
v E T X P, x eU'. 

c) Deduce that g ' is C 2 over its whole domain V 7 , and, intuitively, C k . 
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—, globally 299 

—, locally 299 
flow 196 

— existence 409 

— smoothness 411 
flux 355 

— of a 4-momentum 358 
force, Newtonian 346 

—, relativistic (4-) 367 
form, bilinear 66 
—, multilinear 98 
—, one- 102 
—, quadratic 76, 133 
forward curve 340 

— vector 340 
4-force 367 
4-momentum 348 
4-velocity 346 
frame, inertial 271, 342 
—, local Lorentz 374 

— of reference 62, 80 
free fall 376 
freeing map 44 
frequency 354 
fudge factor 380 
function 6 
functional 57 

galaxies leaving us 381 
Gauss-Bonnet Theorem 323 
Gaussian curvature 319 
geodesic 246 
—, closed 247 
—, crossed 248 
—, deviation 324, 373 
—, diangle 321 
—, <=> energy-critical 262 
—, equation 246, 248 
—, history 376 
geodesically complete 250 
geodesy 248 
geoid 248 
global 156, 160 
gradient of a function 174 
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— vector 73 
gravitational collapse 251 

— red shift 369, 397 
group, algebraic 292 

—, general linear 31, 282 
—, special linear 282 
—, symmetric 37 
—, velocity 353 


Hairy Ball Theorem 183, 300 
hairy formulae 316 
Hamiltonian 347 
Hausdorff 121 
history 341 
—, affine 342 
—, geodesic 376 
homology theory 320 
homeomorphic/ism 123 
horizontal curve 228 

— part 220 

— vector 214, 220 

— vector field 249 
Hubble 381 

hull, affine 45 
—, convex 51 
—, linear 20 
hyperboloid 284 
hyperdrive 274 
hyperplane, affine 45 

— vector 23 
hypersurface 166 

idempotent 31 
identity, additive 18 
—, Bianchi first 314, 315 
—, —, second 316, 317, 319 
—, Jacobi 204 

— map 10 

— operator 24 

—, polarisation 76 
—, tensor field 178 
image 7, 29 
—, inverse 9 
inclusion 8 
indefinite integral 193 
independence, affine 46 
—, linear 22 

index raising/lowering 110, 188 
induced metric 142 

— metric tensor 178 

— operator 95 

— topology 142 
inequality, Schwarz 70 


—, triangle 116 
inertial frame 271,342 

— observer 272, 342 
infimum 147 

infinitesimal connection 227 

— transformation 227 
injection 9 
injective 9 

inner product, standard 67 
—, space 67 
integrable 194 
integral curve 196 
—, definite 193 
—, divergent 193 
—, indefinite 193 
—, infinite 193 

— of vector value function 408 
Intermediate Value Theorem 136, 400 
intersection 3 

interval 2 

—, half-unbounded 2 
inverse, additive 18 

— map 11 

— operator 24 

Inverse Function Theorem 156, 415 

invertible 24 

isometry 79 

isomorphism, affine 54 

—, vector 24 

isotropic 96 

Jacobi identity 204 
Jacobian determinant 155 

— matrix 154 

kernel 29 

Klein bottle 164, 299, 323 
Kronecker 6 12 

A-contraction 401 
large-scale structure 

-of the earth 372 

-of spacetime 381 

least energy 256 

— length 268, 273 

— time 265 

Leibniz rule 159, 188, 216, 233, 241, 
243 

length 68 
—, -critical 262 

— of a curve 193, 262 
—, least 268, 273 

—, maximum 268, 270 
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— of a vector 68 
Lie algebra 292 

— group 281 

— bracket 200 
light cone 70 
lightlike 69 
limit 126 

—, pointwise 404 
—, -preserving 126 
—, uniform 406 
linear combination 21 

— connection 209 

— dependence 22 

— function 24 

— functional 57 

— map (ping) 24 

— operator 24 

— part 54 

line element 192 
line subspace 20 
local 156 

— flow 196 

— Lorentz frame 374 
Lorentz 14 

— frame 272 
—, local 374 
—, metric 67 

—, sigh of 67, 237, 332 

— space 70, 178 

— transformation 80 

manifold 161 

—, ^-pinched 282 

—, Einstein 333, 335, 380 

—, embedded 165 

—, Lorentz 178 

—, pseudo-Riemannian 178 

—, Riemannian 178 

—, smooth 161 

—, topological 163 

map, mapping 6 

mass relativistic 349 

— rest 348 
mass-energy 351 
—, density 358 

— of star 386 
matrix 26 

—, diagonal 92 
—, identity 28 
—, Jacobian 154 
—, similar 29 
matter tensor 358 
—, self-adjoint? 360 


maximal vector 93 
member 1 
metric 66,116 
—, diamond 135 
—, Euclidean 135 
—, Lorentz 67 
—, natural 117 

— square 135 

— tensor field 177 

— trivial 116 

— uniform 405 

— vector space 67 
Michelson-Morley experiment 13 
Minkowski space 178, 341 
mirage 273, 303 
momentum 346fF 
multilinear map 98 

—, form 98 

multiplication, matrix 27 
—, scalar 18 

natural affine structure 45 

— metric 117 
neighbourhood 122 
non-degenerate metric tensor 66 

— subspace 67 

non-Euclidean geometry, elliptic 249 
—, hyperbolic 256 
norm 70 

— of an operator 94 
—, partial 70 
normalise 70 
n-sphere 4 

nullity 29 
—, affine 55 
null cone 70 

— curve 192 

— vector 69, 178 
number, natural 1 
—, real 2 

Oedipus 289 
Olga 32 
one-form 102 
open 118, 120 

— ball 117 

— map 125 

— in a subspace 142 
operator 24 

— algebra 32 
—, induced 95 
—, orthogonal 80 
—, unitary 80 
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orbit, Keplerian 167 
—, relativistic 392fF 
orientation 38 

— dbt>e 366 

— preserving 39 

— reversing 39 

— time 340 
origin 19 

— of the universe 251, 383 
ordered n-tuple 4 
orthogonal complement 79 

— matrix 91 

— operator 80 

— projection 77, 82 

— set 85 

— vectors 71 
orthonormal set 85 

— basis 85 

paradox 119 

— twins 15, 263 

— time travel 289 
parallel postulate 249 

— subspace 46 

— transport 225 

— vector fields 222 
—, —, space of 299 
parallelisable 183 
parallelogram, infinitesimal 307 

— in Minkowski space 371 

— rule 19 

parametrised surface 258 
parameter 190 

parametrisation by arc length 191 

—, canonical 262, 264 

path 189 

—, -connected 190 

partial derivative 153 

—, mixed 203 

—, norm 70 

particle history 341 

—, Newtonian 350 

—, relativistic? 398 

perihelion 394 

permutation 36 

perpetual motion 369 

Pfaffian 184 

plane subspace 20 

planetary orbits 392 

point 1 

— of closure 117 
polarisation identity 76 
precession 395 


pressure 360 
Primum Mobile 376 
principal directions 96, 361 

— stresses 362 

product of affine spaces 180 
—, dot 66 
—, inner 66 
—, rule 37 
—, tensor 100 

— of vector spaces 180 
projection 32 

—, orthogonal 77, 82 
pseudometric 116 
pseudo-Riemannian 178 

quadratic form 76, 133 
quantum mechanics 265, 347 

range 7 
rank 29 
—, affine 55 
real line 2 

— n-space 19 

— number 2 
red shift 354 

— galactic 381 

— gravitational 369, 397 
relation 5 

— order 5 

— equivalence 5 

relativistic crystal symmetries 295 

— Simple Harmonic Motion 292 
relativity, Buddhist 340 

—, general 372 
—, Newtonian 376 
—, principle of 15 
—, special 341 
remarkable theorem 324 
reparametrisation, affine 190 

— by arc length 291, 264 
—, constant 190 

—, continuous 190 
—, smooth 190 
representative 5 
rest energy 351, 352 

— mass 348 
-, zero 351 

—, relative to frame ( F rest) 342 
—, relative to section (g-rest) 341 
—, velocity 271, 274, 342 
restriction of a map 8 

— of a vector field 219 
Ricci curvature 330 
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— directions 333 

— tensor 331, 377ff 

-, bivariant 333 

-, Schwarzschild 385 

-, sign 332 

— transformation 329, 331 
Ricci’s Lemma 242 
rolling 208 

— without turning 207, 209 

— or slipping 208 
rotation 80 

—, infinitesimal 305, 306, 319 

saddle point 265 
scalar 19 

— curvature 332 
—, sign of 333 
Schur’s Theorem 327 
Schwarz inequality 70 
Schwarzschild metric 386 

-, Christoffel symbols 387 

section of a bundle 176 

—, spacelike 341 
self-adjoint 81 
semimetric 116 
sequence 125 
—, Cauchy 400 
—, convergent 126 
set 1 

—, empty 2 
—, indexing 3 
shortest curve 268, 273 
Shrinking Lemma 401 
signature 87 
sign of forces 360 

— Lorentz metric 67, 237, 332 

— permutation 36 

— Ricci tensor 332 

— Riemann tensor 237, 382 

— scalar curvature 333 
similar matrices 29 
simple harmonic motion 293 
simple tensor 103 
simultaneous 342 
singleton 1 

singular 24 
size 70 

— of the universe 382 
skew-self-adjoint 307 
skew-symmetric 66, 112, 307 
smooth 152 

solution curve 196 
space, affine 43 


—, component 342 
—, inner product 67 
—, Lorentz 70, 178 
—, metric 116 
—, metric vector 67 
—, metrisable 121 
—, Minkowski 178 
—, projective 248, 382 
—, separation 342 
—,tangent 44, 163, 168 
—, topological 121 
—, vector 18 
spacelike curve 192 
—, entirely 342 

— section 341 

— vector 69, 178 
spacetime 178 
—, static 379 
span 20 

sphere 4, 124, 161, 382 
stress(-energy) tensor 358 
sub-basis 134 
submanifold 166 
subsequence 126 
subset 1 

subspace, affine 45 
—, non-degenerate 67 
—, topological 143 

— vector 79, 179 
Sylvester’s Law of Inertia 87 
symmetric bilinear form 66 

— connection 229 

— group 37 

— operator 91, 93 
symplectic 67 

tangent bundle 174 

— curves 205 

— to a curve 190 

— space 44, 163, 168 

— vector 44, 168 

— vector field 177, 218 
tension 360 

tensor bundle 175 

— compound 103 

— constant 178, 244 

— curvature 311 
—, degree of 105 
—, Einstein 335 

—, energy-momentum 358 

— field 176 

-, metric 177 

—, matter 358 
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metric 66 

product of functionals 

— maps 104 
, — spaces 101, 102 

— vectors 102 
Riemann 311 
Ricci 331,377 
stress(-energy) 358 
simple 103 
torsion 230 

— on a vector space 105 
—, Weyl 338, 378 
Theorema Egregium 324 
tidal forces 325 


100 


time component 342 

— difference 342 

— dilation 343 
—, greatest 271 
—, least 265 

—, oriented 340 
—, proper 271 
—, travel 289, 340 
timelike curve 192 

— vector 69, 178 
topological space 121 

— manifold 163 
topology 120 

—, algebraic 299, 300, 320, 323 
—, discrete 129 
—, metric 121 
—, open box 130 
—, pseudometric 121 
—, usual 131 
—, Hausdorff 121 
—, weak 129 
torsion 228 

— tensor 230 
trace 40 

traceless 282, 307, 337 
transformation formulae for connections 


218 

-for tensor fields 187 

-for vector fields 186 

translate 46 
translation 54 
transport 225 
transpose 60 
triangle equality 43 
—, geodesic 320 

— inequality 116 
trivial subspace 20 

— metric 116 
twins 15, 263 


unimodular 281 
—, infinitesimally 307 
union 3 
unit cube 34 

— square 33 

— vector 70 
universal cover 292,383 
universe collapse 251, 383 

— expanding 381 

— origin 251,383 

variation, first 259 

— formula 260 
—, second 265 
—, smooth 257 
vector 18 

—, bound 44 
—, contravariant 57 
—, covariant 57 

— field 170, 176 

-along a curve 218 

-along a surface 258 

-, contravariant 176 

-, cotangent 177 

-, covariant 177 

-, horizontal 249 

-, tangent 177, 218 

—, forward 340 

—, free 44 
—, gradient 73 
—, horizontal 214 
—, maximal 93 
—, null 69, 178 
—, space 18 
—, spacelike 69, 178 
—, tangent 44, 168 
—, — to a curve 190 
—, timelike 69, 178 

— unit 70 

—, vertical 214, 220 
—, zero 18 
velocity, 4- 346 

—, group 353 
—, offence 13 

— relative to frame 342 
—, rest 342 

vertical part 220 

— vector 214, 219 
violet shift 354 
volume 34, 99, 113, 366 
—, positive 366 
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wave number 354 
Weyl tensor 338,377,384 
world-line 247, 341 

zero subspace 20 
— vector 18 


“And further, by these, my son, be admonished: 
of making many books there is no end; 
and much study is a weariness of the flesh.” 

Ecclesiastes 12, 12 
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